ashdtu / quick-recipe

Extracting procedural instructions from cooking videos

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset

Please download data, model checkpoint files and logs from drive. Extract it in data/, model/ and logs/ directories respectively. Once extracted in data/, our processed YouCook2 dataset in Pickle format will be available at data/full_master_updated.pkl.

Acknowledgement

This codebase has been initialised from the repo for the paper "A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos".

We have also adapted this repo for our Attention-LSTM Visual and Multimodal model, and this repo for our Multihead Transformer model.

Project Organization

├── data               <- Any datasets that need to be saved
├── logs               <- Logs from model training runs
├── models             <- Save any model weights here
├── notebooks          <- Notebooks for running experiments
├── scripts            <- Youcook scripts
├── attn-lstm          <- Attention-LSTM training and evaluation scripts
├── src                <- Model training and evaluation code
    ├── models         <- Model classes
    ├── processing     <- Data processing scripts

Getting Started

pip install -r requirements.txt

Experiment wise Notebooks

Feature Re-alignment

  • Feature Re-alignment notebook: notebooks/yc2_feature_alignment_v2.ipynb
  • Video pipeline notebook: notebooks/video_pieline.ipynb

KeyClip Selection

  • DistilBERT-Classifier(with and without Context): notebooks/DistilBERT-keyclip.ipynb
  • Self-Attention (Text and Multi-modal experiments): notebooks/self-attention-keyclip.ipynb
  • Creating Sentence Embeddings(MiniLM-L6, DistilBERT): notebooks/create-sentence-embeddings.ipynb
  • Independent Visual Features: notebooks/yc2_cnn_visual_only.ipynb
  • Unified Visual Features: notebooks/yc2_visual_unified_feat.ipynb
  • Self-attention on visual features: notebooks/yc2_self_attn.ipynb
  • Neural Selection Baseline notebooks/neural_selection_baseline.ipynb

Knowledge Extraction

  • T5 Small: notebooks/2_1_Knowledge_Extraction_T5small.ipynb
  • T5 Base: notebooks/2_2_Knowledge_Extraction_T5base.ipynb
  • BART: notebooks/2_3_Knowledge_Extraction_BART.ipynb
  • BART (with Coreference Resolution): notebooks/2_4_Knowledge_Extraction_BART_Coref.ipynb
  • Attention LSTM (Visual-only): notebooks/2_5_Knowledge_Extraction_Attention_LSTM_Visual.ipynb
  • Attention LSTM (Multimodal): notebooks/2_6_Knowledge_Extraction_Attention_LSTM_Multimodal.ipynb
  • Multihead Transformer (Multimodal): notebooks/2_7_Knowledge_Extraction_Multihead_Transformer.ipynb

Scripts

Attention LSTM

Train

cd attn-lstm
# For training Attention LSTM Visual model
python train_visual.py

# For training Attention LSTM Multimodal model
python train_multimodal.py

Evaluate

cd attn-lstm
# For training Attention LSTM Visual model
python evaluate_visual.py

# For training Attention LSTM Multimodal model
python evaluate_multimodal.py

Multihead Transformer

Train

cd multihead-transformer
python prepare_dataset.py
python main_multimodal.py

Evaluate

cd multihead-transformer
python evaluate_multimodal.py

About

Extracting procedural instructions from cooking videos


Languages

Language:Jupyter Notebook 85.0%Language:Python 15.0%