TheUnknownKeywords

Repo for the Hackathon Sk[AI] is the limit

Task 1 Predictions

We tried several BERT Models for our Predictions. Our bests results were with dbmdz/bert-base-german-uncased from huggingface.co

Our Pipeline is:

Clean the data:
1. Remove some JS and HMTL Tagging
2. Remove Mail Headers from AW and FW Mails
Train Model with full set, there might be an overfit
Predict classes with trained model

Using BERT pre-trained dbmdz/bert-base-german-uncased model.

Some results:

Corpus preprocessing:

Repo for the Hackathon Sk[AI] is the limit

Language:Python 70.7%Language:Jupyter Notebook 29.3%