davidedelpapa / pdfnanny

pdf2mp3 using gTTS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDFNanny

pdf2mp3 using gTTS

Versions

The first commit was unsuccessful. The package pdftotext was unable to reliably extract text from the pdf page in test/. Got to find some other package.

From the second commit, we changed the tet extraction library, using PyPDF2: things are starting to work.

TODO

This for now is a proof of concept. This is what I wish to have in the next releases:

  • TOC extraction, so to divide the audio in multiple mp3's. A failback system when TOC extractions fails. Considering also not to fill up allotted shares for gTTS
  • autodetecting text language, prior to passing it to gTTS

About

pdf2mp3 using gTTS

License:MIT License


Languages

Language:Python 55.4%Language:Shell 44.6%