fedecalendino / pysub-parser

Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).

Home Page:https://pypi.org/project/pysub-parser/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feature request: write cleaned subtitles into srt file

spfeifer222 opened this issue · comments

Dear Federico,

I did not managed to get the cleaned text into the subtitle object:

Frist I tried:

sub_gen = parse('./Keep the Home Fires Burning.srt')

for subtitle in sub_gen: 
     subtitle.text = subtitle.clean_up() 

I've got an AttributeError:

AttributeError: can't set attribute

My second idea was to write the data into a new subtitle object by creating a method:

def get_clean_subtitle(subtitle): 
     clean_text = subtitle.clean_up() 
     return Subtitle(subtitle.index, start=subtitle.start, end=subtitle.end, text_lines=clean_text) 

But here I could not access the Subtitle class. I could not import it via from pysubparser.classes import Subtitle. An ImportError is raised:

ImportError: cannot import name 'Subtitle' from 'pysubparser.classes' (/home/pfeifer/dev/pysub-parser/pysubparser/classes/__init__.py)

However, even if I would managed these issues, my goal to write the cleaned subtitles into a srt file is not been solved, too. I am quite new in Python, thus could you explain why my effords ware not successful and add a method to write corrected subtitle files?

Best regards,
Sebastian

commented

Hi again @spfeifer222!

You should be able to get the cleaned text of a subtitle by doing subtitle.clean as you can see here: https://github.com/federicocalendino/pysub-parser/blob/master/pysubparser/classes/classes.py#L28

I could send you a file with advertising, but I find no way to attach. Thus, here some example lines:

  • <font color="#c0c0c0">www.addic7ed.com</font>
  • - Synced and corrected by <font color="#009BCB"><b>chamallow</b></font> -
  • - www.addic7ed.com -
  • contact www.OpenSubtitles.org today
  • to remove all ads from www.OpenSubtitles.org
  • - <font color="#D81D1D">Synced and corrected by VitoSilans</font> -

However, if you look at my fork https://github.com/spfeifer222/pysub-parser on feat-save_srt branch I made some changes. If you like we could merge it all together.

Here is what I did (and remember):

  • I focused on srt handling - no changes on the other types.
  • I introduced a Subtitles class to save all subitles of a movie.
    • It takes as argument/attribute:
      • subtitles: a list of Subtitle objects
      • source: the source path of the file from which the subtitles are imported
      • subtype: file extension taken from source
      • encoding: encoding used
    • I add the following methods:
      • shift: shift all Subtitle objects by a given timedelta
      • write: backup original *.srt file and write (modified) subitles into file with original name. To do so,
  • I changed the parse function for srt files to return an initiated Subtitles object instead of a generator. Additionally, I created a writers folder.
  • New in the parse function is a boolean named clean. Default is 'True` in order to clean subtitles during import.
  • I changed some default values for the cleaning function. Here, did I forget a possibility to change that? (I put that in to-do's - see below)
  • Minor changes in def __repr__(self). Add start_string and end_string attribute for Subtitle class which is used to write __repr__(self).

To-do's (in my mind):

  • Possibility to change default values for clean_up function
  • shift function in classes.py: remove subtitle before new start time (maybe the same for after new end time.
  • create error class for write function. And remove subtitles without text.
  • I do not now the other filetypes for subtitles, maybe the Subtitles class need be extended to handle them correctly.

However, I am quite new to python. Please tell me what you think about and how to improve my code!

Take care!
Sebastian

commented

Hi @spfeifer222 !

I've just pushed the version 1.2 of this library to PyPI. I added a more clean way of adding cleaners for subtitles, feel free to send a PR if you still want to add the cleaner for advertising.

I also added a writer for SRT files.

You can check the tests that I added for help on how to use this new stuff.