Roman Solomatin's starred repositories
flash-attention
Fast and memory-efficient exact attention
lm-evaluation-harness
A framework for few-shot evaluation of language models.
matmulfreellm
Implementation for MatMul-free LM.
llama-tokenizer-js
JS tokenizer for LLaMA 1 and 2
llm_benchmarks
A collection of benchmarks and datasets for evaluating LLM.
infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
allainews_sources
A list of online news & info sources in the AI/ML/Data Science space