Huu4Ontocord's repositories

MDEL

Multi-Domain Expert Learning

Language:PythonLicense:Apache-2.0Stargazers:68Issues:21Issues:29

rio

Text pre-processing for NLP datasets

Language:PythonLicense:Apache-2.0Stargazers:11Issues:0Issues:0

aurora

Multilingual, Multimodal, Multidomain model based on Starcoderplus and Bakllava

Language:PythonLicense:Apache-2.0Stargazers:4Issues:0Issues:0

muliwai

experimental PII framework

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4Issues:0Issues:0

KeyedVectorsANN

Genism word2vec + Pysparnn ANN + Trimmed GoogleNewsVec = Fast and lightweight NLP tool

Language:PythonStargazers:3Issues:2Issues:0

sungai

Sample multilingual data and tools for creating the data - used for NLP multilingual NLP research

License:Apache-2.0Stargazers:3Issues:0Issues:0

aurora-m

Adapting Starcoderplus for Multimodal Experts

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

M3rlin

Multilingual, Multimodal, Multidomain (M3) Model

Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0

M3rlin-fmengine

M3 Training Using FMengine

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:1
Stargazers:0Issues:1Issues:0

data_tooling

How should we store and serve the dataset?

Language:HTMLLicense:Apache-2.0Stargazers:0Issues:0Issues:0

hpj.py

Simple Python to Javascript translator with an emphasis on readability of generated code.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

oftf

One File Text Filter

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

pii_processing

PII Processing code to clean up BigScience datasets. Reference implementation for the PII Hackathon

License:NOASSERTIONStargazers:0Issues:0Issues:0

summarize

Summarize. is a Streamlit application that performs automatic text summarization using both extractive and abstractive models.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tevatron

Tevatron - A flexible toolkit for dense retrieval research and development.

License:Apache-2.0Stargazers:0Issues:0Issues:0

Viet-Mistral

Vietnamese Mistral

License:Apache-2.0Stargazers:0Issues:0Issues:0