Yicong's starred repositories
ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
awesome-3D-gaussian-splatting
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
Awesome-AIGC-3D
A curated list of awesome AIGC 3D papers
awesome-3d-diffusion
A collection of papers on diffusion models for 3D generation.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
awesome-3D-generation
A curated list of awesome 3d generation papers
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
PointMetaBase
This is a PyTorch implementation of PointMetaBase proposed by our paper "Meta Architecure for Point Cloud Analysis"
Point-Transformers
Point Transformers
Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
MultiModal_BigModels_Survey
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
CVPR2024-Papers-with-Code
CVPR 2024 论文和开源项目合集
ChatReviewer
ChatReviewer: 使用ChatGPT分析论文优缺点,提出改进建议
uvadlc_notebooks
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
google-research
Google Research
InternVideo
Video Foundation Models & Data for Multimodal Understanding
In-the-wild-QA
In-the-wild Question Answering
pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
paper-reading
深度学习经典、新论文逐段精读