emcf / thepipe

Extract clean data from anywhere, powered by vision-language models ⚡

Home Page:https://thepi.pe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Video frame + transcript extraction

emcf opened this issue · comments

Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)

Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.