AlphaNext

AlphaNext

Geek Repo

Location:Beijing

Github PK Tool:Github PK Tool

AlphaNext's starred repositories

annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:PythonLicense:MITStargazers:54794Issues:452Issues:132

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:43202Issues:443Issues:9276

vimrc

The ultimate Vim configuration (vimrc)

Language:Vim ScriptLicense:MITStargazers:30639Issues:777Issues:511

EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Language:PythonLicense:Apache-2.0Stargazers:24049Issues:314Issues:987

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21844Issues:185Issues:490

IOPaint

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.

Language:PythonLicense:Apache-2.0Stargazers:19134Issues:144Issues:442

flux

Official inference repo for FLUX.1 models

Language:PythonLicense:Apache-2.0Stargazers:14809Issues:129Issues:138

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:12226Issues:99Issues:549

sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:11514Issues:66Issues:287

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11319Issues:160Issues:305

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonLicense:Apache-2.0Stargazers:7996Issues:120Issues:313

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonLicense:MITStargazers:6436Issues:62Issues:138

DiffSynth-Studio

Enjoy the magic of Diffusion models!

Language:PythonLicense:Apache-2.0Stargazers:6428Issues:55Issues:148

video-subtitle-remover

基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Language:PythonLicense:Apache-2.0Stargazers:4135Issues:33Issues:84

Segment-and-Track-Anything

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:2811Issues:52Issues:154

Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:2572Issues:25Issues:301

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonLicense:Apache-2.0Stargazers:1701Issues:27Issues:79

fastsdcpu

Fast stable diffusion on CPU

Language:PythonLicense:MITStargazers:1454Issues:22Issues:161

Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Language:PythonLicense:MITStargazers:998Issues:0Issues:0

SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Language:PythonLicense:Apache-2.0Stargazers:975Issues:31Issues:79

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonLicense:Apache-2.0Stargazers:910Issues:12Issues:27

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonLicense:Apache-2.0Stargazers:783Issues:8Issues:84

VEnhancer

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation

Language:PythonLicense:Apache-2.0Stargazers:201Issues:0Issues:0

cogvideox-factory

Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed

Language:PythonLicense:Apache-2.0Stargazers:173Issues:5Issues:8

LVCD

The official code of paper "LVCD: Reference-based Lineart Video Colorization with Diffusion Models"

Motion-I2V

[SIGGRAPH 2024] Motion I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

cogvideox-controlnet

Simple Controlnet module for CogvideoX model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:25Issues:0Issues:0
Language:Jupyter NotebookStargazers:19Issues:0Issues:0