Yingfei(Jeremy) Xiang's repositories
Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
devika
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
GPU-Benchmarks-on-LLM-Inference
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
MambaInLlama
Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
PentestGPT
A GPT-empowered penetration testing tool
persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
simple-one-api
OpenAI 接口接入适配,支持千帆大模型平台、讯飞星火大模型、腾讯混元以及MiniMax、Deep-Seek,等兼容OpenAI接口,仅单可执行文件,配置超级简单,一键部署,开箱即用.
small-LMs-Task-Planning
Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning
InternLM
Official release of InternLM2.5 7B base and chat models. 1M context support