longcw

Long Chen's starred repositories

Real3DPortrait

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code

Language:PythonMIT71600

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

Language:Jupyter NotebookNOASSERTION1632200

unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Language:HTMLApache-2.0703100

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonMIT544800

GFPGAN-onnxruntime-demo

This is the onnxruntime inference code for GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior (CVPR 2021). Official code: https://github.com/TencentARC/GFPGAN

Language:Python10500

Face-Restoration-TensorRT

A simple face restoration TensorRT deployment solution.

Language:C++6900

anything-llm

The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.

Language:JavaScriptMIT1537300

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonApache-2.0585600

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

Language:TypeScriptMIT315600

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION3218400

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Language:Python951700

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:Jupyter NotebookApache-2.0410200

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.0446900

search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

Language:TypeScriptApache-2.0718700

TaskMatrix

Language:PythonNOASSERTION3453100

TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

Language:PythonMIT148500

AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

Language:PythonMIT376600

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonApache-2.070100

gaussian-grouping

Gaussian Grouping for open-world Anything reconstruction, segmentation and editing.

Language:Jupyter NotebookApache-2.047000

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.0526700

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.064300

open_clip

An open source implementation of CLIP.

Language:Jupyter NotebookNOASSERTION879000

MVDream

Multi-view Diffusion for 3D Generation

Language:PythonMIT68100

MVDream-threestudio

3D generation code for MVDream

Language:PythonApache-2.044900

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Language:Jupyter NotebookApache-2.0421900

T2I-Adapter

Language:Python323500

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonMIT335200

EditAnything

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

Language:PythonApache-2.0318300

multimodal-garment-designer

This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023

Language:PythonNOASSERTION37700

CVinW_Readings

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

104300