Yuchong Sun 孙宇冲's repositories
activitynet-qa
An VideoQA dataset based on the videos from ActivityNet
awesome-embodied-vision
Reading list for research topics in embodied vision
awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
BriVL
Bridging Vision and Language Model
BriVL-BUA-applications
Bling's Object detection tool
CMHSE
The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch
collaborative-experts
Video embeddings for retrieval with natural language queries
DeepSpeedExamples
Example models using DeepSpeed
detr
End-to-End Object Detection with Transformers
FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
MachineLearningNotebooks
Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
merlot
MERLOT: Multimodal Neural Script Knowledge Models
trl
Train transformer language models with reinforcement learning.
VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
vokenization
PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"