Huu4Ontocord's repositories
KeyedVectorsANN
Genism word2vec + Pysparnn ANN + Trimmed GoogleNewsVec = Fast and lightweight NLP tool
M3rlin-fmengine
M3 Training Using FMengine
data_tooling
How should we store and serve the dataset?
Language:HTMLApache-2.0000
hpj.py
Simple Python to Javascript translator with an emphasis on readability of generated code.
Language:PythonMIT000
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:PythonNOASSERTION000
Language:PythonApache-2.0000
oftf
One File Text Filter
Language:PythonApache-2.0000
pii_processing
PII Processing code to clean up BigScience datasets. Reference implementation for the PII Hackathon
NOASSERTION000
summarize
Summarize. is a Streamlit application that performs automatic text summarization using both extractive and abstractive models.
Language:PythonApache-2.0000
tevatron
Tevatron - A flexible toolkit for dense retrieval research and development.
Apache-2.0000
Viet-Mistral
Vietnamese Mistral
Apache-2.0000