wofmanaf

zhql's starred repositories

Volcano

[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image features for interpretation.

Language:Python3600

Dataset-Pruning

Dataset pruning for ImageNet and LAION-2B.

Language:PythonMIT5300

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

Language:Jupyter NotebookMIT149300

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonNOASSERTION248900

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0102200

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonApache-2.0413000

Kerlinn.github.io

Language:HTML600

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.081700

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonBSD-3-Clause22100

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonApache-2.0120300

bigcode-dataset

Language:Jupyter NotebookApache-2.033000

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonApache-2.086100

EfficientSAM

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Language:Jupyter NotebookApache-2.0192000

MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language:PythonMIT211900

UltraEval

An open source framework for evaluating foundation models.

Language:PythonApache-2.017500

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:Jupyter NotebookApache-2.0426500

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonMIT148400

Data-for-LaTeX_OCR

LaTeX OCR 的数据仓库

8600

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.0711800

EmbodiedScan

[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language:PythonApache-2.034500

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonMIT545100

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.0409200

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonApache-2.0176400

datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language:PythonMIT238700

UReader

Language:PythonApache-2.09900

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

MIT312500

EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Language:PythonApache-2.0227400

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012753200

open_clip

An open source implementation of CLIP.

Language:Jupyter NotebookNOASSERTION890500

llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

Language:PythonGPL-3.033100