Video frame + transcript extraction
emcf opened this issue · comments
Emmett McFarlane commented
Looking to support extraction of mp4
, mov
, webm
, avi
files as well as youtube
for a Vision-Language model (not a video model)
Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.