arian-askari

Arian Askari's starred repositories

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookMIT9762 84 247

arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Language:PythonApache-2.05085 31 52

RL4LMs

A modular RL library to fine-tune language models to human preferences

Language:PythonApache-2.02141 26 54

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonApache-2.01914 19 77

awesome-twitter-data

A list of Twitter datasets and related resources.

CC0-1.0932 25 5

attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

Language:PythonApache-2.0651 12 29

awesome-pretrained-models-for-information-retrieval

A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).

626 21 2

GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"

Language:PythonApache-2.0493 4 74

QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Language:Python329 9 10

GenRead

Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.

Language:Python273 13 7

Knowledge-Grounded-Conversation

A Knowledge Grounded Conversation (KGC) Paper Reading List Maintained by Chuan Meng.

261 15 1

The official repository for "Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation", Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon and Daxin Jiang.

Language:PythonMIT105 1 16

RAGElo

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

Language:PythonApache-2.093 7 11

Twitter-Follower-Count

Display the number of followers of Twitter users

Language:JavaScriptGPL-3.062 2 4

DukeNet

Code for SIGIR-2020 full paper: DukeNet: A Dual Knowledge Interaction Network for Knowledge-Grounded Conversation

Language:Python29 6 3

hagrid

A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

Apache-2.028 4 2

RefNet

Code for AAAI-2020 oral paper: RefNet: A Reference-aware Network for Background Based Conversation

Language:Python26 2 2

reddit_collector

Reddit Collector and Text Processor

Language:Python20 20

MUSER

Language:PythonMIT20 1 1

MANtIS

MANtIS - a multi-domain information seeking dialogues dataset

Language:Python16 4 1

ranger

Ranger helps you see the forest among the trees - Ranger is an effect-size meta analysis library creating beautiful forest plots!

Language:PythonApache-2.011 10

LLM-Misinfo-QA

This repository contains data and code used for On the Risk of Misinformation Pollution with Large Language Models (EMNLP 2023 Findings).

Language:Python10 2 3

Wikipedia_TF_IDF_Dataset

Pre-computed IDF stats over all EN Wiki articles

MIT9 130

conformal-factual-lm

Language:Python9 20

transformer-vs-bm25

ECIR'22 - How Different are Pre-trained Transformers for Text Ranking? D.Rau et al.

Language:Python5 1 1

HK-legalGPT

3 10

SIP

Code for the CIKM 2023 long paper: System Initiative Prediction for Multi-turn Conversational Information Seeking

Language:Python2 20

bem_score_pytorch

Answer Equivalence BEM score example in PyTorch using Huggingface Tokenizer

Language:Python1 10