AlphaNext

AlphaNext's starred repositories

annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:PythonMIT54794 452 132

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonApache-2.043202 443 9276

vimrc

The ultimate Vim configuration (vimrc)

Language:Vim ScriptMIT30639 777 511

EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Language:PythonApache-2.024049 314 987

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.021844 185 490

IOPaint

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.

Language:PythonApache-2.019134 144 442

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.014809 129 138

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonApache-2.012226 99 549

sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.011514 66 287

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT11319 160 305

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonApache-2.07996 120 313

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonMIT6436 62 138

DiffSynth-Studio

Enjoy the magic of Diffusion models!

Language:PythonApache-2.06428 55 148

video-subtitle-remover

基于AI的图片/视频硬字幕去除、文本水印去除，无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API，本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Language:PythonApache-2.04135 33 84

Segment-and-Track-Anything

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.

Language:Jupyter NotebookAGPL-3.02811 52 154

Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Language:PythonApache-2.02572 25 301

RecommenderSystem

2319 27 6

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonApache-2.01701 27 79

fastsdcpu

Fast stable diffusion on CPU

Language:PythonMIT1454 22 161

Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Language:PythonMIT99800

SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Language:PythonApache-2.0975 31 79

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonApache-2.0910 12 27

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonApache-2.0783 8 84

VEnhancer

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation

Language:Python426 19 23

svd_keyframe_interpolation

Language:PythonApache-2.020100

cogvideox-factory

Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed

Language:PythonApache-2.0173 5 8

LVCD

The official code of paper "LVCD: Reference-based Lineart Video Colorization with Diffusion Models"

Language:Python124 3 2

Motion-I2V

[SIGGRAPH 2024] Motion I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Language:Python92 9 7

cogvideox-controlnet

Simple Controlnet module for CogvideoX model.

Language:Jupyter NotebookApache-2.02500

Surgical-SAM-2

Language:Jupyter Notebook1900