Beast code in Giters

VenkteshV's starred repositories

al-folio-homepage

A beautiful, simple, clean, and responsive Jekyll theme for academics

Language:HTMLMIT200

docprompting

Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023

Language:PythonApache-2.023000

OLMo-Eval

Evaluation suite for LLMs

Language:PythonApache-2.028500

interactive-clustering

Python package used to apply NLP interactive clustering methods.

Language:PythonNOASSERTION1000

ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627

Language:PythonMIT42800

Child-Vocab-Development

This project was originally my term project for a computational linguistics course at Pitt. It was turned into a research project later and I am working on publishing the work.

Language:Jupyter NotebookGPL-3.0500

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonMIT168300

polyfuse

Fusion for TREC run files with popular fusion techniques

Language:CMIT2200

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR

Language:Jupyter NotebookMIT28800

ranking-utils

Miscellaneous utilities for ranking models

Language:PythonMIT900

llm-efficiency-challenge.github.io

Website for NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day

Language:HTMLMIT700

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonApache-2.01463100

SCANING

[CIKM'23] Code and data for our paper 'James ate 5 oranges = Steve bought 5 pencils': Structure-Aware Denoising for Paraphrasing Word Problems

Language:Python200

asr-scoring

Common scripts for scoring JSALT 2023 ASR systems

Language:PythonApache-2.0100

jose-reviews

Reviews for the Journal of Open Source Education (JOSE)

CC0-1.03400

LLM-Blender

[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.

Language:PythonApache-2.084000

20cikm-behavior

200

Efficient-Fact-checking

Master thesis on supporting fact extraction on large data collections for a more efficient fact-checking process in real-world applications.

Language:Jupyter Notebook100

IR_project_19

Language:Jupyter Notebook200

automated-interpretability

Language:Python93300

joss-papers

Accepted JOSS papers

Language:HTMLCC-BY-4.024200

simple-llm-finetuner

Simple UI for LLM Model Finetuning

Language:Jupyter NotebookMIT204200

RWKV-LM-LoRA

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.040500

VenkteshV