Ajatt-Tools / impd

🍵 AJATT-style passive listening and condensed audio without bloat.

Home Page:https://tatsumoto-ren.github.io/blog/passive-listening

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

impd should automatically choose right internal subs

asakura42 opened this issue · comments

Long story short. I have a video file with a bunch of internal subs:

impd probe output:

Index  Language  Title                                        Type
0      unknown   Www.SeiresHD.Com                             video
1      spa       unknown                                      audio
2      eng       unknown                                      audio
3      spa       Spanish - (Caption/Normal Size Char)         subtitle
4      eng       English - (Closed Caption/Normal Size Char)  subtitle
5      unknown   unknown                                      subtitle

When I add a video to my collection, it condenses video with 5 subtitle, which is that sub track for songs and other sounds. I think that impd should choose internal subs based on:

  1. Target language
  2. Size of subtitles

So the largest and target subs should be chosen for condensing. What do you think?

Edit your config file and add the following lines:

langs=spa
prefer_internal_subs=yes

My config:

langs=spanish,spa,esp,lat,cas
prefer_internal_subs=yes
video_dir=/dev/null
bitrate=32k
recent_threshold=10
padding=0.5
line_skip_pattern="^♪〜$|^〜♪$"
filename_skip_pattern="NCOP|NCED"
extract_audio_add_args=()

Try it yourself with any file from this folder: https://mega.nz/folder/oW8ihKCZ#sHuu63kset-BAn-XqFa7Nw

Condensing doesn't work tho. But it's because of bmp fonts I guess. But that's not critical.

I guess you need to manually set what tracks you want because the tracks are incorrectly named.

language: spa

Where they are incorrectly named?

For example, here it chooses Forzados subtitle while should choose 4:

Index  Language  Title     Type
0      unknown   unknown   video
1      spa       unknown   audio
2      eng       unknown   audio
3      spa       Forzados  subtitle
4      spa       unknown   subtitle
5      eng       unknown   subtitle

Can you add smth to detect the largest target-language sub track?

impd chooses the first track that is:

  • not a song, caption, commentary, etc.
  • matches the preferred language

impd/impd

Line 111 in 48535fb

guess_track_priority() {

Can you add smth to detect the largest target-language sub track?

Based on the number of symbols used? If so, that is a good idea but I'm not sure if it's easy to do.

@tatsumoto-ren
You can use smth like:

function subs() {
    mkdir -p /tmp/impd_subs
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" /tmp/impd_subs/"${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
    wc --total=never -l /tmp/impd_subs/*.srt | grep "_spa_" | sort -r | awk -F_ '{print $NF}' | awk -F. '{print $1}' | head -n1
}

This outputs the number of the largest track.

(Main snippet found here: https://gist.github.com/kowalcj0/ae0bdc43018e2718fb75290079b8839a)

Or much simpler:

while IFS=',' read -r idx lang; do printf "$idx " && ffmpeg -nostdin -hide_banner -loglevel quiet -i "la_directora_S01E02.mkv" -map 0:"$idx" -f srt - | wc -l; done < <(ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "la_directora_S01E02.mkv" | grep ",spa") | sort -nrk2,2 | head -n1 | awk '{print $1}'

This outputs the number of the largest track.

How fast does it work for a typical episode?

For 379mb mkv file output of time for this snippet at my old laptop is 0.32s user 0.34s system 108% cpu 0.607 total

Alright, if it's not too slow (need to test on anime specifically), you can submit the PR. But you also need to think about the following:

  • only apply this method to subtitle tracks; audio tracks can be autoselected using the current method only.
  • filter out (or give lower priority to) commentary tracks since they contain garbage but yet can be longer than normal subtitle tracks
  • filter out all other garbage tracks (songs, signs, comments) though it's likely that they will be shorter than the normal subtitle tracks.