Yandex Programming Championship 2019, October

Machine Learining track

Stage 2

Topic model

Given a title and description of the document develop a classification model on 100 topics (description of the task in russian).

Appoach

For each document concatenate title and description and extract TfIdf features on words and char n-grams. Then train one-vs-the-rest (OvR) multiclass linear model with modified Huber loss and stochastic gradient descent method.

This will reach a f1 score 60.83 (see leaderboard).

Usage

pip install -r requirements.txt
python train.py

About

Yandex Programming Championship 2019, October

Languages

Language:Python 100.0%