sidney_NLP's repositories
Awesome-Multi-label-Image-Recognition
Awesome Multi-label Image Recognition Paper List
Classical-Modern
非常全的文言文(古文)-现代文平行语料
clearml
ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
cmu_multilingual_speech
CMU multilingual speech repository
composer
library of speed-up algorithms for model training
dataset_difficulty
"Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)
extend
Entity Disambiguation as text extraction (ACL 2022)
facestar
Facestar dataset. High quality audio-visual recordings of human conversational speech.
famie
FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction
FREDA
Fast and Flexible Data Annotation for Relation Extraction
huggingsound
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools
IndicLink
IndicLink is a Multilingual Fact Linking (MFL) dataset of sentences and a set of WikiData facts (subject; relation; object) contained in each sentence. IndicLink contains sentences from English and 6 Indian languages - Hindi, Telugu, Tamil, Urdu, Gujarati and Assamese. The correct facts are chosen from an oracle of 4.7 million Wikidata facts with fact labels/descriptions available in these 7 languages. The dataset is intended only to act as a test set to evaluate models trained for the task of MFL. For more details, please see https://arxiv.org/abs/2109.14364
lab-website-template
(Pre-release) An easy-to-use, flexible website template for labs, with automatic citations, GitHub tag imports, pre-built components, and more
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡
lingfeat
LingFeat - A Comprehensive Linguistic Features Extraction ToolKit for Readability Assessment
NS-Dial
An Interpretable Neuro-Symbolic Framework for Task-Oriented Dialogue Generation
python_plot_utils
A simple code for plotting figure, colorbar, and cropping with python
SELFRec
An open-source framework for self-supervised recommender systems.
TempEL
Repository for Temporal Entity Linking (TempEL), accepted to NeurIPS 2022 Dataset and Benchmarks
timelms
TimeLMs: Diachronic Language Models from Twitter
tools
实用工具:markdown写PPT、命令行自动演示工具、前端组件库等
txtai
💡 Build AI-powered semantic search applications
video2dataset
Easily create large video dataset from video urls
wikipedia-utils
Utility scripts for preprocessing Wikipedia texts for NLP
yahp
hyperparameter management