zye1996

zye1996

Geek Repo

Company:GMU

Location:Fairfax, VA

Home Page:zye1996.github.io

Github PK Tool:Github PK Tool

zye1996's starred repositories

Alpaca-CoT

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2532Issues:0Issues:0

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:CudaLicense:Apache-2.0Stargazers:80Issues:0Issues:0

fastText

Library for fast text representation and classification.

Language:HTMLLicense:MITStargazers:25765Issues:0Issues:0

magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

Language:PythonLicense:MITStargazers:247Issues:0Issues:0

immich

High performance self-hosted photo and video management solution.

Language:TypeScriptLicense:AGPL-3.0Stargazers:40650Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:369Issues:0Issues:0

storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Language:PythonLicense:MITStargazers:8275Issues:0Issues:0

PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Language:PythonLicense:Apache-2.0Stargazers:3116Issues:0Issues:0

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language:PythonLicense:Apache-2.0Stargazers:3281Issues:0Issues:0

BiLLM

Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embeddings. Compatible with 🤗 transformers.

Language:PythonLicense:MITStargazers:35Issues:0Issues:0

AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Language:PythonLicense:MITStargazers:414Issues:0Issues:0

pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Language:PythonLicense:Apache-2.0Stargazers:1568Issues:0Issues:0

rank_llm

Repository for prompt-decoding using LLMs (GPT3.5, GPT4, Vicuna, and Zephyr)

Language:PythonLicense:Apache-2.0Stargazers:277Issues:0Issues:0

InPars

Inquisitive Parrots for Search

Language:PythonLicense:Apache-2.0Stargazers:172Issues:0Issues:0

graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Language:PythonLicense:MITStargazers:12848Issues:0Issues:0

mistral.rs

Blazingly fast LLM inference.

Language:RustLicense:MITStargazers:3111Issues:0Issues:0

Scrapegraph-ai

Python scraper based on AI

Language:PythonLicense:MITStargazers:13452Issues:0Issues:0

llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

Language:PythonLicense:MITStargazers:861Issues:0Issues:0

LASER

Language-Agnostic SEntence Representations

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:3568Issues:0Issues:0

GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Language:PythonLicense:MITStargazers:6950Issues:0Issues:0

tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

Language:PythonLicense:Apache-2.0Stargazers:443Issues:0Issues:0

awesome-software-architecture

🚀 A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.

License:CC0-1.0Stargazers:7432Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:12584Issues:0Issues:0

ChatLaw

ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型

License:AGPL-3.0Stargazers:6703Issues:0Issues:0

cobalt

save what you love

Language:JavaScriptLicense:AGPL-3.0Stargazers:11698Issues:0Issues:0

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonLicense:MITStargazers:2132Issues:0Issues:0

infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

Language:PythonLicense:MITStargazers:1079Issues:0Issues:0

reddit-dataset

Dataset of threads and comments from reddit

Stargazers:171Issues:0Issues:0

LexLIP-ICCV23

Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval"

Language:PythonLicense:Apache-2.0Stargazers:37Issues:0Issues:0

omniglue

Code release for CVPR'24 submission 'OmniGlue'

Language:PythonLicense:Apache-2.0Stargazers:466Issues:0Issues:0