Sotaro Takeshita / 竹下 颯太郎 (sobamchan)

sobamchan

Geek Repo

Location:Mannheim, Germany

Home Page:https://sotaro.io/about

Github PK Tool:Github PK Tool

Sotaro Takeshita / 竹下 颯太郎's starred repositories

Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Language:TypeScriptLicense:MITStargazers:13478Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1949Issues:0Issues:0

haiku-dpo

Using open source LLMs to build synthetic datasets for direct preference optimization

Language:Jupyter NotebookStargazers:34Issues:0Issues:0

InPars

Inquisitive Parrots for Search

Language:PythonLicense:Apache-2.0Stargazers:175Issues:0Issues:0

RAGElo

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

Language:PythonLicense:Apache-2.0Stargazers:101Issues:0Issues:0

bibsearch

Download, manage, and search a BibTeX database.

Language:TeXLicense:NOASSERTIONStargazers:64Issues:0Issues:0

semantic-grep

grep for words with similar meaning to the query

Language:GoLicense:MITStargazers:1104Issues:0Issues:0

DSI-transformers

A huggingface transformers implementation of "Transformer Memory as a Differentiable Search Index"

Language:PythonLicense:MITStargazers:163Issues:0Issues:0

GenIR-Survey

This is the repository for the GenIR survey.

License:MITStargazers:107Issues:0Issues:0

overlapy

Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.

Language:PythonLicense:MITStargazers:8Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:3Issues:0Issues:0

summary-of-a-haystack

Codebase accompanying the Summary of a Haystack paper.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:65Issues:0Issues:0

unarXive

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

Language:PythonLicense:MITStargazers:257Issues:0Issues:0

RaLLe

RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models

Language:PythonLicense:MITStargazers:52Issues:0Issues:0

mbr-anomaly

Code for "On the True Distribution Approximation of Minimum Bayes-Risk Decoding," NAACL 2024

Language:PythonLicense:MITStargazers:4Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:27227Issues:0Issues:0
Language:PythonStargazers:6Issues:0Issues:0

MSciNLI

The code, and data for the NAACL 2024 paper "MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference" will be released here.

Stargazers:1Issues:0Issues:0

py-setproctitle

A Python module to customize the process title

Language:CLicense:NOASSERTIONStargazers:493Issues:0Issues:0

PICL

Code for ACL2023 paper: Pre-Training to Learn in Context

Language:PythonLicense:MITStargazers:106Issues:0Issues:0

improved-t5

Experiments for efforts to train a new and improved t5

Language:PythonStargazers:76Issues:0Issues:0

zola

A fast static site generator in a single binary with everything built-in. https://www.getzola.org

Language:RustLicense:MITStargazers:13480Issues:0Issues:0

tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

Language:PythonLicense:Apache-2.0Stargazers:144Issues:0Issues:0

rank_bm25

A Collection of BM25 Algorithms in Python

Language:PythonLicense:Apache-2.0Stargazers:991Issues:0Issues:0

pyserde

Yet another serialization library on top of dataclasses, inspired by serde-rs.

Language:PythonLicense:MITStargazers:714Issues:0Issues:0

WebVOWL

Visualizing ontologies on the Web

Language:JavaScriptLicense:MITStargazers:715Issues:0Issues:0

rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

Language:PythonLicense:BSD-3-ClauseStargazers:2146Issues:0Issues:0

nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models

Language:PythonLicense:Apache-2.0Stargazers:957Issues:0Issues:0

ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

Language:JavaScriptLicense:MITStargazers:177Issues:0Issues:0

orjson

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

Language:PythonLicense:Apache-2.0Stargazers:6073Issues:0Issues:0