pip install -r requirements.txt
⚠️ transformer library has some issue with truncate features when using pipeline, if you could not run the model and getting error due to exceding number of tokens pass (> 512), you have change it manually in the libary's file (specifically, inself.vectorizer
) to truncate:truncate=True, model_max_length=512
python app.py
In order to share the project you have to set up: demo.launch(share=True)
- RuBert model: report_bert.ipynb
- BoW model: report_v3.ipynb
- Supported files extensions:
- rtf
- doc
- docx
- Inference:
- Prediction label
- Model explainablity: words weights/attention
- UI:
- Allow upload user files
- Visualize predicted label
- Visualzie model explainability of its prediction.
- Model analysis using eli5:
- Identified keywords which model using to classify documents, only '1' and '2' classes have bias as a top feature, which probably should be tackles on the next stage.
- SHAP words highlight based on bert output:
Metrics | |
---|---|
accuracy_score | 0.9583 |
precision_score | 0.9583 |
f1_score | 0.9583 |
recall_score | 0.9583 |