James Chang's repositories
DocMTAgent
Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
FakeShield
The official implementation of 'FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models'
FasterCache
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
FCGS
:rocket: [ARXIV 2024] Pytorch implementation of 'Fast Feedforward 3D Gaussian Splatting Compression'
FlatQuant
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
Janus
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
L-CITEEVAL
L-CITEEVAL: DO LONG-CONTEXT MODELS TRULY LEVERAGE CONTEXT FOR RESPONDING?
Mamba-in-Computer-Vision
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
MomentumSMoE
Implementation for MomentumSMoE
monst3r
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
MVGS
MVGS: Multi-View Regulated Gaussian Splatting for Novel View Synthesis
Ossmodels
The best OSS video generation models
PDF-Wukong
A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
PhyGenBench
The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Pyramid-Flow
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
ragbuilder
A toolkit to create optimal Production-ready RAG setup for your data
REPA
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
RoboticsDiffusionTransformer
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
ScaleQuest
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
Spatial-Mamba
[ICLR2025] Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
TextHarmony
The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation
Video-XL
🔥🔥First-ever hour scale video understanding models