ShiyueZhang's starred repositories
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
gpt-2-output-dataset
Dataset of GPT-2 outputs for research in detection, biases, and more
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
pytorch-struct
Fast, general, and tested differentiable structured prediction in PyTorch
berkeley-doc-summarizer
The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploits syntactic information to compress it, and uses coreference constraints to ensure clarity.
chatgpt-failures
Failure archive for ChatGPT and similar models
Multi-News
Large-scale multi-document summarization dataset and code
simple-web-audio-recorder-demo
A simple HTML/JS demo that uses WebAudioRecorder.js to record audio on a web page
longeval-summarization
Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https://arxiv.org/abs/2301.13298).
SummarizationPrograms
[ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
MixCE-acl2023
Implementation of MixCE method described in ACL 2023 paper by Zhang et al.
LitePyramids
Method for evaluating system summaries manually, via crowdsourcing, using a summarization dataset that includes reference summaries.
truncation-sampling
Codebase describing experiments in Truncation Sampling as Language Model Desmoothing