sachac / subed

subed is a subtitle editor for Emacs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Find overlapping timestamps when sanitizing

rndusr opened this issue · comments

subed should now properly prevent overlapping subtitles while editing, but it
will still load a file with illegal timestamps.

Sanitizing should report:

  • Subtitles that have a larger start time than their stop time.

  • Subtitles that have a larger stop time than the next subtitle's start time, if
    there is a next subtitle.

Should this be implemented generically for all subtitle formats?

I'm not familiar with any subtitle formats besides SubRip. If all widely used
formats have the same concept of start/stop timestamps per subtitle, these
checks should probably be implemented generically.

this would be a really helpful feature. i just had to do this manually on a 30 min video. has anyone started on it or even scripted a little workaround perhaps?

i cooked up a basic hack to implement trimming overlapping subtitle times for srt subs

(defcustom subed-trim-overlapping-subtitle-times nil
  "Whether overlapping subtitles should be adjusted on saving. A string of either 'start', to adjust the start time of the following subtitle or 'end', to adjust the end of the current subtitle. Defaults to nil."
  :type 'string
  :group 'subed)

(setq subed-trim-overlapping-subtitle-times "start")

(defun subed-srt--trim-overlap-end-times ()
  "Check if end time of current subtitle is after start point of next.

If so, trim the end time of current subtitle to 1 milisecond less than next."
  (interactive)
  (let ((next-sub-start-time (save-excursion
                               (subed-srt--forward-subtitle-time-start)
                               (subed-srt--subtitle-msecs-start))))
    (if (>= (subed-srt--subtitle-msecs-stop) next-sub-start-time)
        (subed-srt--set-subtitle-time-stop
         (1- next-sub-start-time)))))

(defun subed-srt--trim-overlap-start-times ()
  "Check if end time of current subtitle is after start point of next.

If so, trim the start time of next subtitle to 1 milisecond more than prev."
  (interactive)
  (let ((this-sub-stop-time (subed-srt--subtitle-msecs-stop))
        (next-sub-start-time (save-excursion
                               (subed-srt--forward-subtitle-time-start)
                               (subed-srt--subtitle-msecs-start))))
    (if (>= this-sub-stop-time next-sub-start-time)
        (save-excursion
          (subed-srt--forward-subtitle-time-start)
          (subed-srt--set-subtitle-time-start
           (1+ this-sub-stop-time))))))

(defun subed-srt--sanitize-overlaps ()
  "Adjust all overlapping times in current file.

Uses either `subed-srt--trim-overlap-start-times' or `subed-srt--trim-overlapping-end-times', the latter being the default. See `subed-trim-overlap-start-or-end' to customize this option."
  (interactive)
  (goto-char (point-min))
  (save-excursion
    (while (subed-srt--forward-subtitle-time-start)
      (if (equal subed-trim-overlapping-subtitle-times "start")
          (subed-srt--trim-overlap-start-times)
        (if (equal subed-trim-overlap-start-or-end "end")
            (subed-srt--trim-overlap-end-times))))))
  
;; add to existing subed-srt--sort:
(defun subed-srt--sort ()
  "Sanitize, then sort subtitles by start time and re-number them."
  (interactive)
  (atomic-change-group
    (subed-srt--sanitize)
    (when subed-trim-overlapping-subtitle-times
      (subed-srt--sanitize-overlaps))
    (subed-srt--validate)
    (subed-save-excursion
     (goto-char (point-min))
     (sort-subr nil
                ;; nextrecfun (move to next record/subtitle or to end-of-buffer
                ;; if there are no more records)
                (lambda () (unless (subed-srt--forward-subtitle-id)
                             (goto-char (point-max))))
                ;; endrecfun (move to end of current record/subtitle)
                #'subed-srt--jump-to-subtitle-end
                ;; startkeyfun (return sort value of current record/subtitle)
                #'subed-srt--subtitle-msecs-start))
    (subed-srt--regenerate-ids)))

a more elegant option might be to also allow custom respecting of subed-milliseconds-adjust, rather than just using a single millisecond difference. but that may also make adjustments that are larger than what is wanted. another poss would be to split the overlap in half and add to each timecode.

this could also easily be adapted to just reporting/validating rather than actually changing, or left out of subed-srt--sort to only be run interactively.

i don't know the subed code well at all, so i'm not sure if this is a very good way to do this.

hi @rndusr,

ah so i just edited the above to make it only run if the custom setting is non-nil, and set that as default.

santize-overlaps is already interactive, so can just be run by the user.
and you are right that it could be good to make it optional to run on loading a file, so you do it before making any manual adjustments. or ask on load.

happy to make it run on both formats, that's what i didn't understand v well with yr code.

i can have another play soon and if you like i can (try to!) submit a PR.

my personal case re 1ms is that while i use subed-subtitle-spacing generally, the gap is not imperative for me, so it makes a kinda sense to left them w no gap unless adjusted. but cd make it an option somehow.

If you could use only functions from subed-common.el to implement this, it should mean all formats are trimmed.

i took a look and i'm a little confused. did you maybe mean using the functions included in subed--generic-function-suffixes? from what i see those functions are converted from generic to specific format funs when subed--init is run, while subed-common.el doesn't seem to contain the funs this functionality would need.

because that variable contains the funs this code needs, it looks like simply reformatting everything using generic functions would work. is that correct?

i also added a custom option to use subed-subtitle-spacing rather than +/- 1 millisecond. universal-arg prefix is easy done, will look into multi prefix options also.