Hulabear Downloader

Download atricles from telnet://hulabear.twbbs.org

Getting Started

Python 2.7

python run.py -a [hula account] -p [hula password] -b [board] -s [starting article number] -e [ending article number]

For example, if you want to download No.2 ~ No.5 articles in board Cs11 (note that the board name is case-sensitive):

python run.py -a account -p password -b Cs11 -s 2 -e 5

Or if you only want to download article No.30:

python run.py -a account -p password -b Cs11 -s 30 -e 30

A Folder download_[board] will be automatically created under the root folder.

For example, if you download articles from Cs11, the articles will be in:

hulabear_downloader\download_Cs11

In config.ini, there are some configs you can try:

[host]
timeout = 6

[data]
page_splitter =

[encode]
file_name = big5 | utf8

If your connection to hulabear is too slow, you can increase the timeout limit.
If you want to know the range of a page within an article, you can change page_splitter to --
The file name can be encoded to either big5 or utf-8.

Waiting for your pull request to fix these issues :)

Do not support most of the BBS control code.
Lines hit the end of the page will appear twice. (because hulabear copies the ending line of a page to the next page as a begining line)
Unexpected spaces in article (due to BBS control code).

The reformat part I reference the craler by geniusturtle