Possible memory leak after a high number of transcriptions

Question

Possible memory leak after a high number of transcriptions

donand opened this issue 2 years ago · comments

Describe the bug
When I generate a high number of transcriptions with the espeak-ng backend of single sentences inside Flask, the time taken by the transcription grows from 150 ms to 2000 ms and after around 2100 requests it starts giving always an error.

The error is one of the following two

OSError: /tmp/tmpd_vs36iz/libespeak-ng.so.1.1.51: failed to map segment from shared object

OSError: /tmp/tmp4ntbh189/libespeak-ng.so.1.1.51: cannot change memory protections

By removing the phonemizer package and calling directly espeak-ng with subprocess the issue is solved. The same if I call phonemizer with subprocess, but is much slower than calling it from python (500 ms vs 150 ms).

Phonemizer version

phonemizer-3.0
available backends: espeak-ng-1.51, segments-2.2.0
uninstalled backends: espeak-mbrola, festival

System
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal

To reproduce
Have a Flask web app with a Waitress server that calls the phonemizer.phonemize() function around 2100 times.

Expected behavior
The transcription time should stay the same and not increase, and it should not give that error.

Additional context
I'm using the phonemizer as a python package, imported from a Flask web application with the Waitress server.

Mathieu Bernard · Answer 1 · Fri Mar 25 2022 01:16:51 GMT+0800 (China Standard Time)

Hi, I think this is probably caused by the number of EspeakBackend classes you instantiate (one per call to phonemize). See https://github.com/bootphon/phonemizer#advice-for-best-performances.

You should instantiate once, then call phonemize from here:

from phonemizer.backend import EspeakBackend
backend = EspeakBackend('en-us', ...)
phonemized = [backend.phonemize(line, ...) for line in flask_requests]

The bug is probably caused by the way espeak library is binded to Python here

Mathieu Bernard · Answer 2 · Fri Mar 25 2022 01:18:19 GMT+0800 (China Standard Time)

But... well... This is definitely a bug. We should at least catch the OSError and re-raise a more explicit error message.

Andrea Donati · Answer 3 · Fri Mar 25 2022 05:51:21 GMT+0800 (China Standard Time)

Thank you for the info! I definitely missed that part of the separate instantiation of the Espeak backend. I will try to replace the calls to the phonemize function with the calls to the backend and run a test.

hadware · Answer 4 · Tue Mar 29 2022 06:03:29 GMT+0800 (China Standard Time)

Note: I'll add a documentation section (somewhere) to document this usage of the lib (i've had to do the same thing about 4 months ago).

CaraDuf · Answer 5 · Thu Dec 29 2022 13:08:54 GMT+0800 (China Standard Time)

Got hit by that too! The documentation section about how to reduce the memory footprint of the phonemizer when used eg in a loop is here : https://bootphon.github.io/phonemizer/common_issues.html#phonemization-is-slow.

Maybe it could be worth mentionning this on the usage page (for use in a loop avoid calling phonemize(text, ...) directly but use backend.phonemize(line, ...) on a properly initialized backend (see https://bootphon.github.io/phonemizer/common_issues.html#phonemization-is-slow for more details).

Vishal Tambrahalli · Answer 6 · Thu Feb 02 2023 19:19:33 GMT+0800 (China Standard Time)

Got hit by that too! The documentation section about how to reduce the memory footprint of the phonemizer when used eg in a loop is here : bootphon.github.io/phonemizer/common_issues.html#phonemization-is-slow.

Maybe it could be worth mentionning this on the usage page (for use in a loop avoid calling phonemize(text, ...) directly but use backend.phonemize(line, ...) on a properly initialized backend (see bootphon.github.io/phonemizer/common_issues.html#phonemization-is-slow for more details).

The backend phonemize method does not have the preserve_punctuation option. Apart from handling the punctuations separately outside, is there something inbuilt that I'm missing?

Mathieu Bernard · Answer 7 · Thu Feb 02 2023 19:27:18 GMT+0800 (China Standard Time)

It has, in the class constructor:

from phonemizer.backend import EspeakBackend
backend = EspeakBackend('en-us', preserve_punctuation=True, other options...)
phonemized = [backend.phonemize(line, ...) for line in text]