zhql (wofmanaf)

wofmanaf

Geek Repo

Location:Shanghai,China

Github PK Tool:Github PK Tool

zhql's starred repositories

Volcano

[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image features for interpretation.

Language:PythonStargazers:36Issues:0Issues:0

Dataset-Pruning

Dataset pruning for ImageNet and LAION-2B.

Language:PythonLicense:MITStargazers:53Issues:0Issues:0

Pix2Text

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

Language:Jupyter NotebookLicense:MITStargazers:1493Issues:0Issues:0

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonLicense:NOASSERTIONStargazers:2489Issues:0Issues:0

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:1022Issues:0Issues:0

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4130Issues:0Issues:0
Language:HTMLStargazers:6Issues:0Issues:0

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:817Issues:0Issues:0

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:221Issues:0Issues:0

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonLicense:Apache-2.0Stargazers:1203Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:330Issues:0Issues:0

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonLicense:Apache-2.0Stargazers:861Issues:0Issues:0

EfficientSAM

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1920Issues:0Issues:0

MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language:PythonLicense:MITStargazers:2119Issues:0Issues:0

UltraEval

An open source framework for evaluating foundation models.

Language:PythonLicense:Apache-2.0Stargazers:175Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4265Issues:0Issues:0

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1484Issues:0Issues:0

Data-for-LaTeX_OCR

LaTeX OCR 的数据仓库

Stargazers:86Issues:0Issues:0

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonLicense:Apache-2.0Stargazers:7118Issues:0Issues:0

EmbodiedScan

[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language:PythonLicense:Apache-2.0Stargazers:345Issues:0Issues:0

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonLicense:MITStargazers:5451Issues:0Issues:0

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4092Issues:0Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonLicense:Apache-2.0Stargazers:1764Issues:0Issues:0

datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language:PythonLicense:MITStargazers:2387Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:99Issues:0Issues:0

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

License:MITStargazers:3125Issues:0Issues:0

EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Language:PythonLicense:Apache-2.0Stargazers:2274Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:127532Issues:0Issues:0

open_clip

An open source implementation of CLIP.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:8905Issues:0Issues:0

llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

Language:PythonLicense:GPL-3.0Stargazers:331Issues:0Issues:0