Jiazhi Yang's repositories
BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
centerformer
Implementation for CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022)
DiffusionDet
PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
dino
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
GroundingDINO
The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
HAL
Code release for "HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs" at AAAI 2020.
LLaMA-Adapter
Fine-tuning LLaMA to follow instructions within 1 Hour and 1.2M Parameters
lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
MAE-pytorch
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
mmdetection
OpenMMLab Detection Toolbox and Benchmark
nerfvis
NeRF visualization library under construction
MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
OpenSelfSup
Self-Supervised Learning Toolbox and Benchmark
PolyLoss
Source code of Universal Weighting Metric Learning for Cross-Modal Matching. The paper is accepted by CVPR2020.
Position-Focused-Attention-Network
Position Focused Attention Network for Image-Text Matching
Proxy-Anchor-CVPR2020
Official PyTorch Implementation of Proxy Anchor Loss for Deep Metric Learning, CVPR 2020
pyllama
LLaMA: Open and Efficient Foundation Language Models
ResNeSt
ResNeSt: Split-Attention Networks
SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
setup
Setup a new machine without sudo!
ST-P3
[ECCV 2022] ST-P3, an end-to-end vision-based autonomous driving framework via spatial-temporal feature learning.
Stable-Pix2Seq
A full-fledged version of Pix2Seq
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
TCP
[NeurIPS 2022] Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
transfuser
[PAMI'22] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving, [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
UniAD
Goal-oriented Autonomous Driving
YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/