sdtblck's repositories
youtube_subtitle_dataset
YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training
Opensubtitles_dataset
downloads and parses subtitle dataset from opensubtitles.org
PDFextract
Extracting pdfs using pdfminer.six and pyPDF2
lm_dataloader
Dataloader tools for language modelling
benchmarking
Tools for benchmarking clusters
example-mkdocs-basic
A basic MkDocs project for Read the Docs
example-sphinx-basic
A basic Sphinx project for Read the Docs
flash-attention
Fast and memory-efficient exact attention
guesslang
Detect the programming language of a source code
Megatron-LM
Ongoing research training transformer models at scale
mesh-transformer-jax
Model parallel transformers in JAX and Haiku
mojo
The Mojo Programming Language
mup
maximal update parametrization (µP)
RealFakeAugment
Image augmentation functions for GAN training
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
transformers-bloom-inference
Fast Inference Solutions for BLOOM
Yandex-Image-Scraper
some tools for scraping images from yandex image search