Paper2Speech

Motivation

As a student in applied mathematics / machine learning, I often get to read scientific books, lecture notes and papers. Usually I prefer listening to a lecture from the professor and following his visual explanations on the blackboard, because then I get much information through the ear and don't have to do the "heavy lifting" through reading only. So far, this has not been available for books and papers.
So I thought: Why not let a software read out the text for you? What if you just had to click a button in the Finder, and the book or paper is converted to speech automatically?
This script uses the Meta Nougat package to extract formatted text from pdf and then converts it to audio using the Google Cloud Text to Speech API.

Capabilities

pause before and after headings
skip references like [1], [1, 2], [1-3]
spell out abbreviations like e.g., i.e., w.r.t., Fig., Eq.
read out inline math
do not read out block math, instead pause
do not read out table contents
read out figure, table captions

Usage

pip3 install -r requirements.txt

python3 main.py <input_file> -o <output_path>

The Google cloud authentication json file should be in the same directory as the main.py file. It can be downloaded from the Google Cloud Console, as described here.
TLDR: On https://cloud.google.com, create a new project. In your project, in the upper right corner, click on the 3 dots > project settings > service accounts > choose one or create service account > create key > json > create. The resulting json file should be downloaded automatically. Google TTS is free for the first 1 million characters per month, then $4 per 1 million characters.

On macOS, you can create a shortcut in the Finder with the following steps:

in Automator, create a new Quick Action.
At the top, choose input as "PDF files" in "Finder".
add a "Run Shell Script" action. Set shell to /bin/zsh and pass input as arguments.
add the following code:

source ~/opt/miniconda3/etc/profile.d/conda.sh
conda activate paper2audio
python3 ~/path/to/paper2speech/main.py $1

save the action and give it a name, e.g. "Paper2Speech"

Limitations

captions of tables, figures are always read at the end
only works for English

Future Work

read out figure caption after referenced in text

aascode / paper2speech