Kaiden's repositories
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
CatVTON
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
chatbox
Chatbox is a desktop app for GPT/LLM that supports Windows, Mac, Linux & Web Online
Cloth2Tex
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On
CnSTD
CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包
decord
An efficient video loader for deep learning with smart shuffling that's super easy to digest
Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image (uncensored)
DemoFusion
Let us democratise high-resolution generation! (arXiv 2023)
DifFace
DifFace: Blind Face Restoration with Diffused Error Contraction (PyTorch)
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
DragGAN
Official Code for DragGAN (SIGGRAPH 2023)
face-sdk
3DiVi Face SDK is a set of software components (code libraries) for building face recognition solutions
FaceRecognizer
人脸识别应用
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.
FETNet
FETNet: Feature Erasing and Transferring Network for Scene Text Removal
lit-gpt
Implementation of Falcon, StableLM, Pythia, INCITE language models based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
LivePortrait
Make one portrait alive!
lossless-cut
The swiss army knife of lossless video/audio editing
nanosam
A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
night-enhancement
[ECCV2022] "Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression", https://arxiv.org/abs/2207.10564
norfair
Lightweight Python library for adding real-time multi-object tracking to any detector.
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
pipeless
An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelines
surya
OCR, layout analysis, and line detection in 90+ languages
VideoMamba
VideoMamba: State Space Model for Efficient Video Understanding
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
watsor
Object detection for video surveillance