fujingling's starred repositories
MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Awesome-Scientific-Language-Models
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
OCRDatasets
A collection of OCR-related datasets
Text-Recognition-Material
Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining
OCR_DataSet
收集并整理有关OCR的数据集并统一标注格式,以便实验需要
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
MimicMotion
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
text-generation-inference
Large Language Model Text Generation Inference
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
diffusion-models-class
Materials for the Hugging Face Diffusion Models Course
Diffusion-Tryon-Trainer
Diffusion-Tryon-Trainer
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬