wanboyang's repositories
Anomaly_AR_Net_ICME_2020
This repository is for Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning(ICME 2020). The original paper can be found (https://ieeexplore.ieee.org/document/9102722) or (https://arxiv.org/abs/2104.07268)
Awesome-Multimodal-Large-Language-Models
Latest Papers and Datasets on Multimodal Large Language Models
Protein-Localization-Transformer
Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction
UCF_2018_CVPR
A reproduce code for Real-world Anomaly Detection in Surveillance Videos
awesome-industrial-anomaly-detection
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
Chinese-STD-GB-T-7714-related-csl
GB/T 7714相关的csl以及Zotero使用技巧及教程。
grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
LLaMA-Adapter
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
LLMVA-GEBC
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
Neighborhood-Attention-Transformer
[Preprint] Neighborhood Attention Transformer
pykaldi
A Python wrapper for Kaldi
stable-diffusion
A latent text-to-image diffusion model
Textual-Visual-Semantic-Dataset
Visual Semantic Relatedness Dataset for Image Captioning. https://arxiv.org/abs/2301.08784
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
wanboyang.github.io
AcadHomepage: A Modern and Responsive Academic Personal Homepage
Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning