duyuankai1992

followers

following

stars

duyuankai1992's repositories

Paints-UNDO

Understand Human Behavior to Align True Needs

Apache-2.0000

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

000

DiT-Visualization

Visualization of DiT self attention features

000

Live

收集于互联网上的一些高清直播源。

000

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Apache-2.0000

ImageAnalysisService

轻量模型的图像分析web服务，包括倾斜矫正OCR，公章(印章)检测+识别，车牌识别。api方案使用FastAPI+Gunicorn，提供gradio展示。

000

RS_Scene_ZSL

PyTorch code for Deep Semantic-Visual Alignment for zero-shot remote sensing image scene classification

MIT000

DesignEdit

Code for DesignEdit

000

ASL-Recognizer

Action recognition application using models trained on WLASL dataset to translate ASL to English.

000

MiniGemini

Official implementation for Mini-Gemini

Apache-2.0000

clip_text_span

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

NOASSERTION000

grok-1

Grok open release

Apache-2.0000

OCR_MLLM_TOY

A multimodal large language model for ocr. OCR_MLLM

Language:Python000

Paper-Piano

Piano like no other, Piano on Paper

MIT000

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

MIT000

MagicDance

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

000

LWM

Apache-2.0000

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

MPL-2.0000

stable-diffusion-webui

Stable Diffusion web UI

AGPL-3.0000

SFDA-FSM

[MIA' 22] Source free domain adaptation for medical image segmentation with fourier style mining

Apache-2.0000

manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/

GPL-3.0000

free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍，欢迎投稿

GPL-3.0000

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

NOASSERTION000

sdxl-lightning-demo-app

A demo application using fal.realtime and the lightning fast SDXL API provided by fal

AGPL-3.0000

hello-algo

《Hello 算法》：动画图解、一键运行的数据结构与算法教程，支持 Python, C++, Java, C#, Go, Swift, JS, TS, Dart, Rust, C, Zig 等语言。English edition ongoing

NOASSERTION000

BXC_VideoAnalyzer_v4

基于C++开发的视频行为分析系统v4系统，可以在不用考虑音视频开发，编解码开发，界面开发等情况下，只需要训练自己的模型，开发自己的算法插件，就可以轻松实现出任何想要的视频行为检测，比如周界入侵，烟火检测，打架，斗殴，跌倒，人群聚集，电动车，垃圾箱，抽烟，攀爬，离岗睡岗，安全帽，充电桩，工作服，疲劳检测，交通拥堵等等。

MIT000

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

MIT000

my-tv

我的电视电视直播软件，安装即可使用

000

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Apache-2.0000

surya

Accurate line-level text detection and recognition (OCR) in any language

GPL-3.0000