Sampo Pyysalo's repositories
lumi-llm-scaling
Scripts and documentation on scaling large language model training on the LUMI supercomputer
dl-binf-summer-school-2023
Material for 2023 Summer School on Applied Deep Learning in Bioinformatics
keras-bert-ner
Named entity recognition built on top of BERT and keras-bert.
warc-tools
Tools for working with Web ARChive files.
consensus-pipeline
Annotation consensus processing pipeline
finnish-natural-instructions
Tools and data for a Finnish machine translation of Natural Instructions (https://github.com/allenai/natural-instructions)
generative-lm-server
Simple generative language model service
instruction-finetune
Finetune language model on instruction data
lm-text-correction
Text correction using a language model
string-db-tools
Tools for working with STRING database text mining data
torch-transformers-text-classifier
Simple text classifier using Transformers with the Torch backend.
bert-span-classifier
Text span classifier using BERT
databricks-dolly-translation
Translation of Databricks Dolly instruction dataset
gendemo
Minimal text generation demo using transformers
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
gutenberg-tools
Tools for working with Project Gutenberg texts (https://www.gutenberg.org/)
instruction-generation
Tools for generating instruction data
lumi-causal-lm-finetune
Tools for finetuning large causal language models on LUMI
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Megatron-LM
Ongoing research training transformer models at scale
mt-quality-assessment
Tools and resources for learning to predict machine translation quality
nanotron
Minimalistic large language model 3D-parallelism training
ni-to-chatml
Generate ChatML from Natural Instructions data
onion-tools
Tools for text deduplication using the onion (ONe Instance ONly) tool
paraphrase-generation
Tools and resources for training causal language model for paraphrase generation
suomi24-corpus
Tools for working with the Suomi24 corpus
xling-instructions
Generate instruction-formatted data from translation pairs