fujingling's starred repositories

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Language:PythonStargazers:111Issues:0Issues:0

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:PythonLicense:MITStargazers:378Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4452Issues:0Issues:0

Awesome-Scientific-Language-Models

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

License:MITStargazers:397Issues:0Issues:0

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonLicense:MITStargazers:5581Issues:0Issues:0

OCRDatasets

A collection of OCR-related datasets

Stargazers:84Issues:0Issues:0

Text-Recognition-Material

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

License:Apache-2.0Stargazers:93Issues:0Issues:0

OCR_DataSet

收集并整理有关OCR的数据集并统一标注格式,以便实验需要

Language:PythonStargazers:854Issues:0Issues:0

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1769Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2947Issues:0Issues:0

MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Language:PythonLicense:NOASSERTIONStargazers:1219Issues:0Issues:0

MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Language:PythonLicense:NOASSERTIONStargazers:1974Issues:0Issues:0

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonLicense:Apache-2.0Stargazers:8491Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23861Issues:0Issues:0

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Language:RustLicense:Apache-2.0Stargazers:8752Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34066Issues:0Issues:0

diffusion-models-class

Materials for the Hugging Face Diffusion Models Course

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3431Issues:0Issues:0

Diffusion-Tryon-Trainer

Diffusion-Tryon-Trainer

Language:PythonLicense:NOASSERTIONStargazers:108Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:24311Issues:0Issues:0

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2612Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1673Issues:0Issues:0

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:3971Issues:0Issues:0

unitable

UniTable: Towards a Unified Table Foundation Model

Language:Jupyter NotebookLicense:MITStargazers:299Issues:0Issues:0

COMBO-AVS

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Language:PythonLicense:Apache-2.0Stargazers:27Issues:0Issues:0

detr

End-to-End Object Detection with Transformers

Language:PythonLicense:Apache-2.0Stargazers:13189Issues:0Issues:0

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:783Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4465Issues:0Issues:0

Linly-Talker

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

Language:PythonLicense:MITStargazers:1500Issues:0Issues:0

CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".

Language:PythonLicense:Apache-2.0Stargazers:93Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4476Issues:0Issues:0