Yuichi Tateno (secon)'s starred repositories
language-pretraining
Pre-training Language Models for Japanese
FlagEmbedding
Retrieval and Retrieval-augmented LLMs
RAGatouille
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
gpt-newspaper
GPT based autonomous agent designed to create personalized newspapers tailored to user preferences.
CTranslate2
Fast inference engine for Transformer models
hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
wikipedia-utils
Utility scripts for preprocessing Wikipedia texts for NLP
JAQKET-dataset
JAQKET: JApanese Questions on Knowledge of EnTities for huggingface datasets
SwinTextSpotter
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
laion-datasets
Description and pointers of laion datasets