bootphon / phonemizer

Simple text to phones converter for multiple languages

Home Page:https://bootphon.github.io/phonemizer/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Request: an option to keep empty lines

jncasey opened this issue · comments

Is your feature request related to a problem? Please describe.
I'd like to use this library on a project related to poetry and song lyrics, where empty lines as separators are an important part of the data.

Describe the solution you'd like
It'd be great to add a flag to the phonemize method called something like keep_empty_lines, that would default to False to preserve current behavior, but could be enabled to get my desired behavior. I'm not sure if it's as simple as just adding a conditional around this line, or if passing empty lines to any of the backends could lead to unexpected/bad behavior.

Additional context
I'm using the festival backend, if that makes a difference (to take advantage of its syllable separators)

Hi, indeed this is possible (an easy) to implement that option. It would be preserve_empty_lines=False argument in phonemize() and --preserve-empty-lines from command-line.

I do not have time to this currently but, if you want to submit a pull request with your modifications, please do it :).

Sure, I can give it a shot in the next week or so.

I'm not familiar with the backends. Will they return an empty line if passed an empty line, or will it be necessary to strip out the empty lines from the input and reinsert them post-phonemizing?

Ok great!

All the backends keep empty lines as empty (see https://github.com/bootphon/phonemizer/blob/master/CHANGELOG.md#phonemizer-30, if not this is a bug). So maybe your work will just be to add a if somewhere (and to code the option and possibly few tests to make sure it is working for all the backends...)

Quick update: I thought I had a nice simple solve for this, but I was working on my laptop that didn't have access to the festival backend. It turns out that festival does not like empty lines. That led me to make a couple tweaks the festival backend code, but then there were still problems when preserving punctuation.

I think the less disruptive solution is going to be extracting and reinserting the blank lines in the top level _phonemize method, so I'm going to scrap my current code and switch to that strategy instead.

Closed by #103