hatton / storybuilder

automatic translation of simple picture e-books

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

storybuilder

Produces .mp4 of storybooks similar to bloom

overview and usage

The purpose of this project was to allow quick production, and more importantly, translation of bible stories for theoretically any language that has a working bible audio transcription. Storybuilder is meant to use SAB applications as input.

To run storybuilder, simply run python3 run.py params.json

The parameter file determines where the input and output locations are, and where to set up temporary files.

Text, audio, and timings files need to be specified per Biblical book of the input. Because there exists one audio file per chapter of each book in SAB format, we need to also specify a numbering system for the files. For example, for the collection of files

"04-JHN-01-timing.txt", "04-JHN-02-timing.txt" ... "04-JHN-21-timing.txt

Can be represented by the string

"04-JHN-[nn]-timing.txt"

The number of n's is important, and indicates the amount of padding zeros added to the front.

storybuilder is composed of three stages

  1. Generation of audio snippets
  2. Generation of video (mp4)
  3. Generation of subtitles (srt) and hardcoded subbed videos

Each stage is contained in its own python script. These stages are modular, meaning they do not share code dependencies between each other.

audio.py

In the audio phase we use PyDub to process audio files. We extract information from SAB timings to know where verses begin and end. We then generate an audio snippet per page of each story for all stories in our story json.

video.py

We generate a video file per story using ffmpeg in this stage. For each story, we generate page fragments first and then concatenate.

Each story page has a reference to an image file along with a panzoom (ken burns effect) specified by a start and an end 4-tuple. We generate the video file of this panzoom by applying the appropriate filter to the input image. The format storing each transformation in the json first needs to be translated into a format understandable by ffmpeg. We also encode the audio file of each page from the last step. In this fashion, we create a video fragment per page in the temporary video folder.

After generating all the fragments for a single story, we concatenate all of them with another call to ffmpeg. Concatenation requires a text file which has the names of all fragments, so we generate that file as part of the process (located in the same folder as the fragment videos).

In this stage, we also generate page_timings files which store the length of each page fragment. This is used for the interpolation method of generating subtitles.

subs.py

In this stage, subtitles (srt) and hardcoded subs are created. This stage is optional and can be turned off by modifying the subtitles param value.

We first extract the verses of each book referenced in params.json by parsing the corresponding SAB SFM file. We then use the references of each page of each story to generate the appropriate subtitles.

Because the subtitles for each page may be too long, we need to break up the subtitles into shorter chunks. We then need to align each text chunk temporally. We have two methods of doing so.

The first we call interpolation, wherein the text chunks have a duration proportional to the number of characters per chunk. We need page_timings from the previous step to do this. FFmpeg demonstrated some interesting behavior where it was difficult to predict how long each video file would be. For this reason we got the length of each page from the page fragments instead of from the audio fragments. Moreover, the concatenated video file was actually longer than the sum of each fragment length. We attempted to account for this behavior by calculating the "padding" ffmpeg added between fragments. But the end result of this is that subtitles generated by this method might be slightly misaligned.

The second method used was using the aeneas forced alignment library. We simply call aeneas on the text chunks and audio file of the combined mp4 for the story. This seems to work quite well, but YMMV depending on the language of the text. Aeneas is hardcoded to use esperanto as the language. If Aeneas generates no-speech leaves, then the subtitle generation might get broken (since we are using a naive way to combine the aeneas fragment map and our chunks).

Finally, to hardcode the subs we use ffmpeg again. FFmpeg needs to be compiled using the --enable-libass configuration. There were some problems with this since installing Aeneas with the SIL dmg will install ffmpeg, but the wrong version.

limitations

Right now, ref_begin and ref_end need to be in the same chapter. Verse references are done via counting; for example verse 38 is just the 37-index element of an array. This means that for some chapters where verses might be missing, e.g. Mark 9:45, storybuilder will fetch the wrong verses or crash by an index-out-of-range error.

dependencies

Need PyDub, ffmpeg, aeneas, sh libraries for python3

About

automatic translation of simple picture e-books


Languages

Language:Python 100.0%