fujingling

followers

following

stars

fujingling's starred repositories

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Language:Python11100

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:PythonMIT37800

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION445200

Awesome-Scientific-Language-Models

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

MIT39700

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonMIT558100

OCRDatasets

A collection of OCR-related datasets

8400

Text-Recognition-Material

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Apache-2.09300

OCR_DataSet

收集并整理有关OCR的数据集并统一标注格式，以便实验需要

Language:Python85400

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION176900

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION294700

MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Language:PythonNOASSERTION121900

MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Language:PythonNOASSERTION197400

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0849100

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02386100

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Language:RustApache-2.0875200

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03406600

diffusion-models-class

Materials for the Hugging Face Diffusion Models Course

Language:Jupyter NotebookApache-2.0343100

Diffusion-Tryon-Trainer

Diffusion-Tryon-Trainer

Language:PythonNOASSERTION10800

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02431100

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.0261200

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0167300

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonApache-2.0397100

unitable

UniTable: Towards a Unified Table Foundation Model

Language:Jupyter NotebookMIT29900

COMBO-AVS

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Language:PythonApache-2.02700

detr

End-to-End Object Detection with Transformers

Language:PythonApache-2.01318900

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

Language:PythonApache-2.078300

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonMIT446500

Linly-Talker

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

Language:PythonMIT150000

CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".

Language:PythonApache-2.09300

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonApache-2.0447600