ArvinZhuang

Shengyao Zhuang's repositories

DSI-transformers

A huggingface transformers implementation of "Transformer Memory as a Differentiable Search Index"

Language:PythonMIT158 3 10

The official repository for "Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation", Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon and Daxin Jiang.

Language:PythonMIT105 1 16

OLTR

An onlinel learning to rank python codebase.

Language:Python7 10

BiTAG

Language:PythonMIT3 10

LLM4IR-Survey

This is the repo for the survey of LLM4IR.

MIT100

vec2text

utilities for decoding deep representations (like sentence embeddings) back to text

Language:PythonNOASSERTION100

anserini

Anserini is a Lucene toolkit for reproducible information retrieval research

Language:JavaApache-2.0000

arvinzhuang.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000

arxivscraper

A python module to scrape arxiv.org for specific date range and categories

Language:PythonMIT000

character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"

Language:PythonApache-2.0000

COIL

NAACL2021 - COIL Contextualized Lexical Retriever

Language:PythonApache-2.0000

DL-Hard

Deep Learning Hard (DL-HARD) is a new annotated dataset extending TREC Deep Learning benchmark.

000

IR-Superproject-2023

Apache-2.0000

markdown_readme

Markdown - you can mark up titles, lists, tables, etc., in a much cleaner, readable and accurate way if you do it with HTML.

000

MSMARCO-Document-Ranking-Submissions

Submission archive for the MS MARCO document ranking leaderboard

Language:PythonCC-BY-4.0000

MSMARCO-Passage-Ranking

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR

MIT000

natural-questions

Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.

Apache-2.0000

pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini

Language:Jupyter NotebookApache-2.0000

pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Language:PythonApache-2.0000

pyterrier

A Python framework for performing information retrieval experiments, building on http://terrier.org/

Language:PythonMPL-2.0000

pytorch-lightning

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Language:PythonApache-2.0000

relevation

Information Retrieval Relevance Judging System

Language:HTMLGPL-3.0000

Reranker

Build Text Rerankers with Deep Language Models

NOASSERTION000

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonApache-2.0000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Apache-2.0000

tevatron

Tevatron - A flexible toolkit for dense retrieval research and development.

Language:PythonApache-2.0000

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.0000

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0000

tydiqa

TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.

Language:PythonApache-2.0000

typos-aware-BERT

Language:Python000