AlexHT Hung's repositories
BLOOM-LORA
Due to restriction of LLaMA, we try to reimplement BLOOM-LoRA (much less restricted BLOOM license here https://huggingface.co/spaces/bigscience/license) using Alpaca-LoRA and Alpaca_data_cleaned.json
chineseocr_lite
超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
common
Common source, scripts and utilities shared across all Triton repositories.
data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
data_tooling
Tools for managing datasets for governance and training.
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
DB
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeedExamples
Example models using DeepSpeed
DifferentiableBinarization
DB (Real-time Scene Text Detection with Differentiable Binarization) implementation in Keras and Tensorflow
evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
flash-attention
Fast and memory-efficient exact attention
GMAN
GMAN: A Graph Multi-Attention Network for Traffic Prediction (GMAN, https://fanxlxmu.github.io/publication/aaai2020/) was accepted by AAAI-2020.
googlesearch
A Python library for scraping the Google search engine.
langchain
⚡ Building applications with LLMs through composability ⚡
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
minio-cpp
MinIO C++ Client SDK for Amazon S3 Compatible Cloud Storage
olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
sgpt
SGPT: GPT Sentence Embeddings for Semantic Search
t-zero
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
yolov7
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors