paihengxu

Paiheng Xu's starred repositories

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION2608600

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

Language:PythonMIT2200

vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Language:PythonMIT434400

lloom

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-level concepts to analyze unstructured text.

Language:PythonBSD-3-Clause4700

sammo

A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)

Language:PythonMIT28600

edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data

Language:Jupyter NotebookMIT6600

NLP4SocialGood_Papers

A reading list of up-to-date papers on NLP for Social Good.

27000

tokreate

A minimal library to create tokens using LLMs.

Language:Python600

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.02924600

LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Language:PythonBSD-3-Clause23900

ecco

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).

Language:Jupyter NotebookBSD-3-Clause194800

transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

Language:Jupyter NotebookApache-2.0124000

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.02854100

multi-task-NLP

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

Language:PythonApache-2.036400

Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

Language:Jupyter NotebookApache-2.0316300

classroom-transcript-analysis

Language:PythonMIT2500

dataset_difficulty

"Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)

Language:Jupyter Notebook7600

zoe

Zero-Shot Open Entity Typing as Type-Compatible Grounding, EMNLP'18.

Language:Python4300

vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

Language:PythonMIT26500

COVID-19-TweetIDs

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.

Language:PythonNOASSERTION71400