wprazuch / stack-overflow-tags

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stack Overflow Tags

The repository contains a project for automatic tagging of Stack Overflow questions. To help StackOverflow, we can create a model that will automatically assign tags to the question, based on its content. This problem is known as multi-label classification - for each observation, we can assign multiple labels (tags). This repository contains a simple walkthrough how to deal with such data, how to build a model that will create tags for each question. Soon, some experiment benchmark for testing different model architectures and model algorithms will be added.

Data

You can download a dataset used in this project here.

Methods

The solution was implemented mainly by using scikit-learn and Tensorflow libraries. For tag prediction, a feedforward network was used initially. Soon, more models will be added.

To ech the workflow, see 'notebook.ipynb'. For data sample generation, use 'prepare_data.ipynb'. If you generate all the necessary objects from the notebook (Tokenizer, CountVectorizer, and model), you can launch a REST API server written in FastAPI by using:

python -m server.main

The server will be exposed on port 8000.

This app can be launched in Docker. To set it up, simply run a following command from the root directory of the repo:

docker build -t stacktags -f docker\\development.dockerfile .

To launch the FastAPI server on port 8000, run:

docker run -p 8000:8000  stacktags python -m server.main

Next steps

The solution could contain analysis of different models - this is planned and will be added as soon as possible.

Endnote

Feel free to check my other projects on my Github Pages Site.

About


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%Language:Dockerfile 0.0%