motoom / gutenberg-ebook-scraping

Download, convert and organize Gutenberg books for eBook Readers

Home Page:http://www.michielovertoom.com/python/gutenberg-ebook-scraping

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a set of python scripts which downloads all 
Dutch ebooks from Project Gutenberg, renames them to
human-readabele filenames, formats them so they display well 
on my ebook reader, and tosses them into subdirectories for 
easier navigation.

Written by Michiel Overtoom, motoom@xs4all.nl

How to use:

- Run bulkdownload.py to download the raw texts from a mirror of Project Gutenberg's eBook archive.
- Run gutenberg.py to reformat and rename the raw texts.
- Run toss.py to distribute them over subdirectories.

After that, upload them to your eBook reader, and enjoy!

In March 2016 I reworked this program since it's no longer allowed to scrape
from Gutenberg's main web site. This newer version:

- downloads from a mirror instead of scraping from Gutenberg's main web site
- language can be specified
- better input encoding detection
- outputs UTF8 encoded text files

About

Download, convert and organize Gutenberg books for eBook Readers

http://www.michielovertoom.com/python/gutenberg-ebook-scraping


Languages

Language:Python 100.0%