zye1996

zye1996

Geek Repo

Company:GMU

Location:Fairfax, VA

Home Page:zye1996.github.io

Github PK Tool:Github PK Tool

zye1996's starred repositories

immich

High performance self-hosted photo and video management solution.

Language:TypeScriptLicense:AGPL-3.0Stargazers:47263Issues:224Issues:3837

EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Language:JavaScriptLicense:NOASSERTIONStargazers:34681Issues:224Issues:504

fastText

Library for fast text representation and classification.

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Language:ShellLicense:Apache-2.0Stargazers:25298Issues:308Issues:258

pyright

Static Type Checker for Python

Language:PythonLicense:NOASSERTIONStargazers:13204Issues:124Issues:6156

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

Language:PythonLicense:AGPL-3.0Stargazers:12259Issues:68Issues:411

storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Language:PythonLicense:MITStargazers:11927Issues:87Issues:121

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:10198Issues:92Issues:760

PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Language:PythonLicense:AGPL-3.0Stargazers:4977Issues:36Issues:104

AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

Language:JavaScriptLicense:Apache-2.0Stargazers:4080Issues:62Issues:78

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language:PythonLicense:Apache-2.0Stargazers:4037Issues:25Issues:192

h2o-llmstudio

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/

Language:PythonLicense:Apache-2.0Stargazers:3957Issues:79Issues:397

Alpaca-CoT

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2593Issues:36Issues:100

pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Language:PythonLicense:Apache-2.0Stargazers:1643Issues:18Issues:543

distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language:PythonLicense:Apache-2.0Stargazers:1468Issues:13Issues:413

smol-vision

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:737Issues:11Issues:12

T-MAC

Low-bit LLM inference on CPU with lookup table

Language:C++License:MITStargazers:463Issues:11Issues:40

AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Language:PythonLicense:MITStargazers:454Issues:10Issues:48
Language:PythonLicense:Apache-2.0Stargazers:432Issues:11Issues:11

magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

Language:PythonLicense:MITStargazers:423Issues:5Issues:26

rank_llm

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.

Language:PythonLicense:Apache-2.0Stargazers:311Issues:10Issues:44

MathCoder

Family of LLMs for mathematical reasoning.

Language:PythonLicense:Apache-2.0Stargazers:213Issues:4Issues:3

CSIKit

Python CSI processing and visualisation tools for Atheros, Intel, Nexmon, ESP32, FeitCSI, and PicoScenes (USRP, etc) formats.

Language:PythonLicense:MITStargazers:205Issues:6Issues:51

InsTag

InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning

InPars

Inquisitive Parrots for Search

Language:PythonLicense:Apache-2.0Stargazers:176Issues:8Issues:10

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:CudaLicense:Apache-2.0Stargazers:168Issues:5Issues:9

BiLLM

Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embeddings. Compatible with 🤗 transformers.

Language:PythonLicense:MITStargazers:39Issues:5Issues:7