benthecoder / docutranslate

translate scanned docs

Repository from Github https://github.combenthecoder/docutranslateRepository from Github https://github.combenthecoder/docutranslate

docutranslate

does the following:

scanned pdf file -> images -> text -> gpt-4o -> translated word doc

see test.ipynb for details

example usage

install requirements

pip install -r requirements.txt

process the entire PDF:

python main.py attention.pdf --language "Chinese (Traditional)"

process a single page:

python main.py attention.pdf --language "Chinese (Traditional)" --single-page --page-number 1

references

About

translate scanned docs


Languages

Language:Jupyter Notebook 97.7%Language:Python 2.3%