hemanthkumar17 / VIDEO_CAPTIONING_PIPELINE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VIDEO_CAPTIONING_PIPELINE

We attempt to tackle the challenging task of recipe generation from videos using only pre-trained models. We divided the process of recipe generation into various modules which include event generation, frame extraction, featurizing frames, removing frame redundancy, frame enhancement, frame captioning, and summarization using LLM. We used various pre-trained models to perform different tasks required to achieve desired results at each stage of our recipe generation pipeline. We used the temporal nature of videos, and the power of image embeddings, and harnessed the power of LLMs to extract meaningful content and generate recipes in an efficient manner. We have demonstrated the quality of the recipe generated using various metrics which highlight the impact of our work.

Experiment 1

Screenshot 2023-08-30 at 8 08 28 PM

Experiment 2

Screenshot 2023-08-30 at 8 09 07 PM

Experiment 3

Screenshot 2023-08-30 at 8 09 18 PM

Experiment 4

Screenshot 2023-08-30 at 8 09 27 PM

For detailed explanation refer to the report and the video presentation which contains demos. https://docs.google.com/presentation/d/1R0FjAj_QXoLjxR3NsZRVTFgYu-KnKj2BpN4EcOvyuGI/edit#slide=id.g21eccad0113_0_38

About


Languages

Language:Jupyter Notebook 96.4%Language:Python 3.6%