Xuanlin (Simon) Li's repositories
CS285_Fa19_Deep_Reinforcement_Learning
My solutions to UC Berkeley CS285 (originally CS294-112, deeprlcourse) Fall 2019 assignments
large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
corl_22_frame_mining
[CoRL22] Frame Mining - a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds
iclr2021_rlreg
Regularization Matters in Policy Optimization
autoregressive_inference
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)
efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
MinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
python-pcl
Python bindings to the pointcloud library (pcl)
graspnetAPI
Toolbox for our GraspNet-1Billion dataset.
instant-nsr-pl
Neural Surface reconstruction based on Instant-NGP. Efficient and customizable boilerplate for your research projects. Train NeuS in 10min!
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
MobileSAM
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
rlds_dataset_builder
ManiSkill2 RLDS dataset builder for X-embodiment dataset conversion.
sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
tapnet
Tracking Any Point (TAP)
tensor2robot
Distributed machine learning infrastructure for large-scale robotics research
TensoRF
[ECCV 2022] Tensorial Radiance Fields, a novel approach to model and reconstruct radiance fields
VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
XMem_fork
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection