doc2audiobook.py
Extract text from a document (textract) and convert it into a natural sounding synthesised speech (Cloud Text-To-Speech), which is able to leverage Deepminds Wavenet models.
Example
Available source formats (from textract
)
- .csv
- .doc
- .docx
- .eml
- .epub
- .gif
- .jpg and .jpeg
- .json
- .html and .htm
- .mp3
- .msg
- .odt
- .ogg
- .png
- .pptx
- .ps
- .rtf
- .tiff
- .txt
- .wav
- .xlsx
- .xls
Prerequisites
GCP
- Select or create a Cloud Platform project.
- Enable billing for your project.
- Enable the Cloud Text-to-Speech API.
- Setup Authentication using a Service Account.
Host Machine
- Docker
/doc2audiobook/data/input
: directory to hold all input files./doc2audiobook/data/output
: directory to store all output files./doc2audiobook/.secrets/client_secret.json
: GCP authentication token.
Build
$ git clone git@github.com:danthelion/doc2audiobook.git
$ cd doc2audiobook
$ docker build -t doc2audiobook .
Run
Make sure to put your documents in the folder that is mapped to /data
before running!
List available voices
$ docker run \
$ -v /doc2audiobook/data:/data:rw \
$ -v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
$ doc2audiobook -list-voices
Convert a document to an audiobook using the en-GB-Standard-C voice.
$ docker run \
$ -v /doc2audiobook/data:/data:rw \
$ -v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
$ doc2audiobook --voice en-GB-Standard-C