Xiaobing Han's repositories
apollo
An open autonomous driving platform
autoware
Autoware - the world's leading open-source software project for autonomous driving
BEV-Perception
Bird's Eye View Perception
BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
CityGaussian
Repository for CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
corenet
CoreNet: A library for training deep neural networks
D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Learners"
Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Infusion
Official implementations for paper: InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
interactive3d
[CVPR'24] Interactive3D: Create What You Want by Interactive 3D Generation
mamba360
State Space Models
Mask_RCNN
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
MCTF
Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".
Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
MicroDreamer
Official implementation of "MicroDreamer: Zero-shot 3D Generation in ~20 Seconds by Score-based Iterative Reconstruction".
MiniGPT-3D
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors(Under Review)
mllm
Fast Multimodal LLM on Mobile Devices
MLLM-Bench
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
mvs_objaverse
A little repo to render objaverse objects with blender
PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
RSCaMa
RSCaMa: Remote Sensing Image Change Captioning with State Space Model
VMamba
VMamba: Visual State Space Models,code is based on mamba
xtuner
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)