sidney1994

followers

following

stars

sidney_NLP's repositories

Paper_Writing_Tips

100

Awesome-Multi-label-Image-Recognition

Awesome Multi-label Image Recognition Paper List

000

Classical-Modern

非常全的文言文（古文）-现代文平行语料

MIT000

clearml

ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Language:PythonApache-2.0000

cmu_multilingual_speech

CMU multilingual speech repository

Language:PythonGPL-3.0000

CNERTA

000

composer

library of speed-up algorithms for model training

Language:PythonNOASSERTION000

dataset_difficulty

"Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)

000

extend

Entity Disambiguation as text extraction (ACL 2022)

Language:PythonNOASSERTION000

facestar

Facestar dataset. High quality audio-visual recordings of human conversational speech.

NOASSERTION000

famie

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

GPL-3.0000

FREDA

Fast and Flexible Data Annotation for Relation Extraction

Language:JavaApache-2.0000

huggingsound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

MIT000

IndicLink

IndicLink is a Multilingual Fact Linking (MFL) dataset of sentences and a set of WikiData facts (subject; relation; object) contained in each sentence. IndicLink contains sentences from English and 6 Indian languages - Hindi, Telugu, Tamil, Urdu, Gujarati and Assamese. The correct facts are chosen from an oracle of 4.7 million Wikidata facts with fact labels/descriptions available in these 7 languages. The dataset is intended only to act as a test set to evaluate models trained for the task of MFL. For more details, please see https://arxiv.org/abs/2109.14364

NOASSERTION000

lab-website-template

(Pre-release) An easy-to-use, flexible website template for labs, with automatic citations, GitHub tag imports, pre-built components, and more

BSD-3-Clause000

lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡

000

lingfeat

LingFeat - A Comprehensive Linguistic Features Extraction ToolKit for Readability Assessment

Language:PythonCC-BY-SA-4.0000

NeuralKG

Language:Python000

NS-Dial

An Interpretable Neuro-Symbolic Framework for Task-Oriented Dialogue Generation

Language:PythonMIT000

polytropon

Language:PythonMIT000

python_plot_utils

A simple code for plotting figure, colorbar, and cropping with python

000

SELFRec

An open-source framework for self-supervised recommender systems.

000

TempEL

Repository for Temporal Entity Linking (TempEL), accepted to NeurIPS 2022 Dataset and Benchmarks

Apache-2.0000

timelms

TimeLMs: Diachronic Language Models from Twitter

Language:Jupyter Notebook000

tools

实用工具：markdown写PPT、命令行自动演示工具、前端组件库等

000

torchstudio

000

txtai

💡 Build AI-powered semantic search applications

Apache-2.0000

video2dataset

Easily create large video dataset from video urls

Language:PythonMIT000

wikipedia-utils

Utility scripts for preprocessing Wikipedia texts for NLP

Language:PythonApache-2.0000

yahp

hyperparameter management

NOASSERTION000