UpLabel
UpLabel is a lightweight, Python-based and modular tool which serves to support your machine learning tasks by making the data labeling process more efficient, automated and structured. In the current version, the tool is mainly focused on text classification tasks.
Software Component Flow
User Flow
Setup
- Create a conda environment using environment.yml
- Start 'Test - Pipeline' notebook
Authors
Timm Walz (@nonstoptimm)
Martin Kayser (@maknotavailable)
Glossary
Word | Description |
---|---|
label | target category used by model and to be labeled |
pred(_id) | predicted label, respective numeric identifier |
split | a subset of the data, to be distributed for manual labelling |
Open Tasks
In Progress
- Host as a service in Azure (via FA)
- Improve complexity calculation
TODO
- Integrate with neanno frontend (https://github.com/timoklimmer/neanno).
- Support for Named Entity Recognition tasks.
- Support for Muli-Class Classification tasks.
- Active learning: targeted false positives
- Smart join: label quality score
- Smart load: data integrity validation
- Auto-create labeling documentation