impd should automatically choose right internal subs
asakura42 opened this issue · comments
Long story short. I have a video file with a bunch of internal subs:
impd probe
output:
Index Language Title Type
0 unknown Www.SeiresHD.Com video
1 spa unknown audio
2 eng unknown audio
3 spa Spanish - (Caption/Normal Size Char) subtitle
4 eng English - (Closed Caption/Normal Size Char) subtitle
5 unknown unknown subtitle
When I add
a video to my collection, it condenses video with 5
subtitle, which is that sub track for songs and other sounds. I think that impd should choose internal subs based on:
- Target language
- Size of subtitles
So the largest and target subs should be chosen for condensing. What do you think?
Edit your config file and add the following lines:
langs=spa
prefer_internal_subs=yes
My config:
langs=spanish,spa,esp,lat,cas
prefer_internal_subs=yes
video_dir=/dev/null
bitrate=32k
recent_threshold=10
padding=0.5
line_skip_pattern="^♪〜$|^〜♪$"
filename_skip_pattern="NCOP|NCED"
extract_audio_add_args=()
Try it yourself with any file from this folder: https://mega.nz/folder/oW8ihKCZ#sHuu63kset-BAn-XqFa7Nw
Condensing doesn't work tho. But it's because of bmp fonts I guess. But that's not critical.
I guess you need to manually set what tracks you want because the tracks are incorrectly named.
language: spa
Where they are incorrectly named?
For example, here it chooses Forzados
subtitle while should choose 4
:
Index Language Title Type
0 unknown unknown video
1 spa unknown audio
2 eng unknown audio
3 spa Forzados subtitle
4 spa unknown subtitle
5 eng unknown subtitle
Can you add smth to detect the largest target-language sub track?
Can you add smth to detect the largest target-language sub track?
Based on the number of symbols used? If so, that is a good idea but I'm not sure if it's easy to do.
@tatsumoto-ren
You can use smth like:
function subs() {
mkdir -p /tmp/impd_subs
movie="${1}"
filename="${1%.*}"
mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
OLDIFS=$IFS
IFS=,
( while read idx lang
do
echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" /tmp/impd_subs/"${filename}_${lang}_${idx}.srt"
done <<< "${mappings}" )
IFS=$OLDIFS
wc --total=never -l /tmp/impd_subs/*.srt | grep "_spa_" | sort -r | awk -F_ '{print $NF}' | awk -F. '{print $1}' | head -n1
}
This outputs the number of the largest track.
(Main snippet found here: https://gist.github.com/kowalcj0/ae0bdc43018e2718fb75290079b8839a)
Or much simpler:
while IFS=',' read -r idx lang; do printf "$idx " && ffmpeg -nostdin -hide_banner -loglevel quiet -i "la_directora_S01E02.mkv" -map 0:"$idx" -f srt - | wc -l; done < <(ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "la_directora_S01E02.mkv" | grep ",spa") | sort -nrk2,2 | head -n1 | awk '{print $1}'
This outputs the number of the largest track.
How fast does it work for a typical episode?
For 379mb mkv file output of time
for this snippet at my old laptop is 0.32s user 0.34s system 108% cpu 0.607 total
Alright, if it's not too slow (need to test on anime specifically), you can submit the PR. But you also need to think about the following:
- only apply this method to subtitle tracks; audio tracks can be autoselected using the current method only.
- filter out (or give lower priority to) commentary tracks since they contain garbage but yet can be longer than normal subtitle tracks
- filter out all other garbage tracks (songs, signs, comments) though it's likely that they will be shorter than the normal subtitle tracks.