How to translate ppt files ?

Read my Medium article to discover how the library was built !

Purpose

Free online translators of PowerPoint files have 2 main issues :

The translation API's are often neither robust to short half-sentences (very common in PowerPoints) not to long text traductions
The structure of PowerPoint presentations are very complex (lots of unordered shapes) and after modification, nice presentation often get shapes misplaced

This project aims to solve the problem and to automate the process of translating *.pptx files with the same nice-reendering as the original, with well-traducted sentences/expressions.

This repo contains materials to :

Translate texts using Selenium on deepL translation website.
Extract and modify PowerPoint texts from different objects with the powerful python-pptx library

4 Scripts are available in `src` folder :

default_selenium.py : defaultSelenium class contains the bases to connect to Selenium API and launch a website
deepL_selenium.py : seleniumDeepL inheritates from the previous one and contains all the interaction specifically needed to the deepL context
ppt_interaction.py : contains functions to inspect a presentations : from presentation, to slides, to shapes, to their text_frame properties.
ppt_translation.py : uses both functions from ppt_interaction.py and seleniumDeepL to accomplish the final task : translating files.

Running the translator

The translation object uses a corpus concept. Text must be given as a list of strings (each string equals to a sentence, max number of caracters in a single sentence is 4900 due to deepL's webpage limits). A translation example is provided.

There are 5 steps to run the translation on a corpus.

Clone the repo

git clone https://github.com/ThibaudLamothe/translate-pptx.git

Download the selenium chromedriver at the project's root. By the way, Google Chrome needs to be installed.
Go to src folder

cd src/

Install necessary libraries

pip install -r requirements.txt

Run the deepL_selenium.py file

python deepL_selenium

The output is the following one :

Translator's features

Initiating the translator launchs the selenium driver and needs a driver to run correctly. This one has to be specified with the driver_path argument. The loglevel might also be indicated (error/warning/information/debug) depending on the level of information to track. See the previous picture.

deepL = seleniumDeepL(driver_path='../chromedriver', loglevel='debug')

When running that command an empty internet pages open. We can now start the translation process.

Functions available

The seleniumDeepL contains multiple methods, but only 4 are useful for the translation process. The other ones are only part of the processing.

deepL.run_translation( see next part for parameters )

This is the main function. It takes the corpus, transforms to better suit the deepL's website, make the traduction and store the results into a dictionnary.

deepL.get_translated_corpus()

It returns the dictionnary of the traducted sentences. Keys are the orginals sentences or group of words, values correspond to their translations.

deepL.save_translations(json_path as str)

It is possible to store the translated as a json file, using that function. It only needs one argument : the path to the json file as a string.

deepL.load_translations(json_path as str)

During the translation process, a sentence which has already been translated is not translated a second time. It is possible to reload translations from a previous run with that functions. It takes the path to a json file as a string.

Running the translation

So far we've seen the 4 useful functions of seleniumDeepL. The deepL.run_translation() is the most important one. Wee'll see now how to correctly use and parameter it.

corpus (as str or list, default : 'Hello, World!')

The corpus is the text to be translated. Can be a string or a list of strings. And as translating one sentence does not necessarly need automation, the list option is more interesting.

destination_language (as str, default : 'en')

self.available_languages = ['fr', 'en', 'de', 'es', 'pt', 'it', 'nl', 'pl', 'ru', 'ja', 'zh']

joiner (as str, default : '\n____\n')
quit_web (as boolean, default : True)
time_to_translate (as integer, default : 10)
time_batch_rest (as integer, default : 2)
raise_error (as boolean, default : False)
load_at (as string default : None)
store_at (as string default : None)
load_and_store_at (as string default : None)

PPT Insertion

Replacing text without modifying its look

Good to know

NB : the project was developped on MacOS and selenium used with Google Chrome

Resources

Changing the text but keeping the Font in python-pptx
Module-wide variables in Python (1/2)
Module-wide variables in Python (2/2)
Selenium French Documentation
Chromedriver
CSS Selectors (recommended into the Selenium documentation)

TODO

Deal with bigger texts. Idea. Separate long sentences on \n's. Reconciliate them after translation. Do it at the reception and delivey of the corpus, so that no modification are done in the batch_corpus creation ?

Hussam1 / translate-pptx

How to translate ppt files ?

Purpose

This repo contains materials to :

4 Scripts are available in `src` folder :

Running the translator

Translator's features

Functions available

Running the translation

PPT Insertion

Good to know

Resources

TODO

About

Languages

How to translate ppt files ?

Purpose

This repo contains materials to :

4 Scripts are available in src folder :

Running the translator

Translator's features

Functions available

Running the translation

PPT Insertion

Good to know

Resources

TODO

About

Languages

4 Scripts are available in `src` folder :