zhql's starred repositories
Dataset-Pruning
Dataset pruning for ImageNet and LAION-2B.
Pix2Text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
EfficientSAM
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Data-for-LaTeX_OCR
LaTeX OCR 的数据仓库
EmbodiedScan
[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
alignment-handbook
Robust recipes to align language models with human and AI preferences
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
llama.onnx
LLaMa/RWKV onnx models, quantization and testcase