songcheng's starred repositories

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonLicense:MITStargazers:28353Issues:281Issues:1100

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:10930Issues:66Issues:678

gallery-dl

Command-line program to download image galleries and collections from several image hosting sites

Language:PythonLicense:GPL-2.0Stargazers:10731Issues:140Issues:4685

pdfminer.six

Community maintained fork of pdfminer - we fathom PDF

Language:PythonLicense:MITStargazers:5626Issues:120Issues:646

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:4245Issues:60Issues:167

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3788Issues:110Issues:69

MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

Language:PythonLicense:NOASSERTIONStargazers:2082Issues:34Issues:95

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:1980Issues:36Issues:72

MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Language:PythonLicense:NOASSERTIONStargazers:1836Issues:36Issues:123

OpenSeeFace

Robust realtime face and facial landmark tracking on CPU with Unity integration

Language:PythonLicense:BSD-2-ClauseStargazers:1365Issues:22Issues:53

ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Language:PythonLicense:Apache-2.0Stargazers:979Issues:42Issues:38

GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Language:PythonLicense:MITStargazers:965Issues:45Issues:33

pyreft

ReFT: Representation Finetuning for Language Models

Language:PythonLicense:Apache-2.0Stargazers:942Issues:15Issues:74
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:940Issues:27Issues:91

search2ai

Help your LLMs online

Language:JavaScriptLicense:MITStargazers:938Issues:11Issues:25

thepipe

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡

Language:PythonLicense:MITStargazers:814Issues:8Issues:17

Arc2Face

Arc2Face: A Foundation Model of Human Faces

Language:PythonLicense:MITStargazers:485Issues:15Issues:19

IDE-3D

[SIGGRAPH Asia 2022] IDE-3D: Interactive Disentangled Editing For High-Resolution 3D-aware Portrait Synthesis

Language:Jupyter NotebookStargazers:472Issues:19Issues:22

clifs

Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine learning model CLIP

Language:JavaScriptLicense:Apache-2.0Stargazers:432Issues:4Issues:11

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Language:PythonLicense:MITStargazers:284Issues:2Issues:18

ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Language:PythonLicense:Apache-2.0Stargazers:76Issues:7Issues:16

LipFD

This repository contains the codes of "Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-syncing DeepFakes".

FreeTalker

Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness (ICASSP 2024)

CharacterGen

[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

Language:JavaScriptLicense:AGPL-3.0Stargazers:33Issues:4Issues:4

ubisoft-laforge-FFHQ-UV-Intrinsics

FFHQ-UV-Intrinstics: A dataset containing intrinsic face decomposition for 10k subjects of FFHQ-UV

Language:PythonLicense:Apache-2.0Stargazers:17Issues:0Issues:0

lip-synthesis

Audio-Visual Lip Synthesis via Intermediate Landmark Representation

Language:PythonStargazers:12Issues:2Issues:0

Video2ARKitBlendshapes

Video to ARKit BlendShapes

Language:PythonStargazers:5Issues:0Issues:0

perm

Official implementation of "Perm: A Parametric Hair Model for Multi-Style 3D Hair Generation"

Stargazers:2Issues:0Issues:0