feature request: write cleaned subtitles into srt file
spfeifer222 opened this issue · comments
Dear Federico,
I did not managed to get the cleaned text into the subtitle object:
Frist I tried:
sub_gen = parse('./Keep the Home Fires Burning.srt')
for subtitle in sub_gen:
subtitle.text = subtitle.clean_up()
I've got an AttributeError
:
AttributeError: can't set attribute
My second idea was to write the data into a new subtitle object by creating a method:
def get_clean_subtitle(subtitle):
clean_text = subtitle.clean_up()
return Subtitle(subtitle.index, start=subtitle.start, end=subtitle.end, text_lines=clean_text)
But here I could not access the Subtitle
class. I could not import it via from pysubparser.classes import Subtitle
. An ImportError
is raised:
ImportError: cannot import name 'Subtitle' from 'pysubparser.classes' (/home/pfeifer/dev/pysub-parser/pysubparser/classes/__init__.py)
However, even if I would managed these issues, my goal to write the cleaned subtitles into a srt file is not been solved, too. I am quite new in Python, thus could you explain why my effords ware not successful and add a method to write corrected subtitle files?
Best regards,
Sebastian
Hi again @spfeifer222!
You should be able to get the cleaned text of a subtitle by doing subtitle.clean
as you can see here: https://github.com/federicocalendino/pysub-parser/blob/master/pysubparser/classes/classes.py#L28
I could send you a file with advertising, but I find no way to attach. Thus, here some example lines:
<font color="#c0c0c0">www.addic7ed.com</font>
- Synced and corrected by <font color="#009BCB"><b>chamallow</b></font> -
- www.addic7ed.com -
contact www.OpenSubtitles.org today
to remove all ads from www.OpenSubtitles.org
- <font color="#D81D1D">Synced and corrected by VitoSilans</font> -
However, if you look at my fork https://github.com/spfeifer222/pysub-parser
on feat-save_srt
branch I made some changes. If you like we could merge it all together.
Here is what I did (and remember):
- I focused on srt handling - no changes on the other types.
- I introduced a
Subtitles
class to save all subitles of a movie.- It takes as argument/attribute:
subtitles
: a list ofSubtitle
objectssource
: the source path of the file from which the subtitles are importedsubtype
: file extension taken fromsource
encoding
: encoding used
- I add the following methods:
shift
: shift all Subtitle objects by a given timedeltawrite
: backup original *.srt file and write (modified) subitles into file with original name. To do so,
- It takes as argument/attribute:
- I changed the
parse
function for srt files toreturn
an initiatedSubtitles
object instead of a generator. Additionally, I created awriters
folder. - New in the
parse
function is aboolean
namedclean
. Default is 'True` in order to clean subtitles during import. - I changed some default values for the cleaning function. Here, did I forget a possibility to change that? (I put that in to-do's - see below)
- Minor changes in
def __repr__(self)
. Addstart_string
andend_string
attribute for Subtitle class which is used to write__repr__(self)
.
To-do's (in my mind):
- Possibility to change default values for
clean_up
function shift
function inclasses.py
: remove subtitle before new start time (maybe the same for after new end time.- create
error class
forwrite
function. And remove subtitles without text. - I do not now the other filetypes for subtitles, maybe the
Subtitles
class need be extended to handle them correctly.
However, I am quite new to python. Please tell me what you think about and how to improve my code!
Take care!
Sebastian
Hi @spfeifer222 !
I've just pushed the version 1.2 of this library to PyPI. I added a more clean way of adding cleaners for subtitles, feel free to send a PR if you still want to add the cleaner for advertising.
I also added a writer for SRT files.
You can check the tests that I added for help on how to use this new stuff.