qixuxiang's starred repositories
flash-attention
Fast and memory-efficient exact attention
Megatron-LM
Ongoing research training transformer models at scale
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
llm-scraper
Turn any webpage into structured data using LLMs
Image-Downloader
Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
stockbot-on-groq
StockBot powered by Groq: Lightning Fast AI Chatbot that Responds With Live Interactive Stock Charts, Financials, News, Screeners, and More. Powered by Llama3-70b on Groq, Vercel AI SDK, and TradingView Widgets.
extraction-framework
The software used to extract structured data from Wikipedia
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Awesome-LLM-Tabular
Awesome-LLM-Tabular: a curated list of Large Language Model applied to Tabular Data
OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
clip-pytorch
这是一个clip-pytorch的模型,可以训练自己的数据集。
VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
MMLongBench-Doc
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
HistoricalDataForTradeBacktest
Historical Trade Data For Backtests