strategist922

User data from Github https://github.com/strategist922

followers

following

stars

Microsoft

Taipei, Taiwan

Organizations

THUKElab

James Chang's repositories

DocMTAgent

Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Language:RoffApache-2.0100

MiniPLM

Language:PythonMIT100

CMM

✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Language:Python000

DICE

000

EmbodiedCity

Language:Python000

facechain

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

Language:Jupyter NotebookApache-2.0000

FakeShield

The official implementation of 'FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models'

000

FasterCache

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Language:Python000

FCGS

:rocket: [ARXIV 2024] Pytorch implementation of 'Fast Feedforward 3D Gaussian Splatting Compression'

Language:PythonNOASSERTION000

FlatQuant

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

MIT000

Hallucination_MDS

000

Janus

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Language:PythonMIT000

L-CITEEVAL

L-CITEEVAL: DO LONG-CONTEXT MODELS TRULY LEVERAGE CONTEXT FOR RESPONDING?

000

LongReward

Apache-2.0000

Mamba-in-Computer-Vision

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

000

MomentumSMoE

Implementation for MomentumSMoE

Language:Python000

monst3r

Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"

Language:Python000

MVGS

MVGS: Multi-View Regulated Gaussian Splatting for Novel View Synthesis

Language:Python000

Ossmodels

The best OSS video generation models

Language:PythonApache-2.0000

PDF-Wukong

A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

000

PhyGenBench

The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

000

Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

MIT000

ragbuilder

A toolkit to create optimal Production-ready RAG setup for your data

Apache-2.0000

REPA

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

MIT000

RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Language:PythonMIT000

SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Language:PythonBSD-3-Clause000

ScaleQuest

We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.

Language:PythonApache-2.0000

Spatial-Mamba

[ICLR2025] Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Apache-2.0000

TextHarmony

The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation

Apache-2.0000

Video-XL

🔥🔥First-ever hour scale video understanding models

Language:PythonApache-2.0000