osmanuygar / turkish-text-classification-api

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Turkish Text Classification API

You can easily build custom turkish text classifiers through this API for more accurate insights, and start detecting topics, sentiment, intent, and more

Coded in 2019

Installation

To download application the following commands.

git clone https://github.com/osmanuygar/turkish-text-classification-api.git
cd turkish-text-classification-api

Create a virtual Python environment in a directory named venv, activate the virtualenv and install required dependencies using pip.

cd <related-path>/turkish-text-classification-api/
virtualenv -p `which python3` venv
source venv/bin/activate
pip install -r requirements.txt

Start the app

Install packages

cd <related-path>/turkish-text-classification-api/
python setup.py develop

Activate the virtual enviroment Start the application with Gunicorn (you can remove workers and threads in gunicorn, if you want)

source <related-virtualenv-path>/venv/bin/activate
cd <related-path>/turkish-text-classification-api/sentiment/

gunicorn \
--bind 0.0.0.0:5001 app:app \
--log-file /opt/log/advanced_analytic_platform.log \
--error-logfile /opt/log/advanced_analytic_platform/error.log \
--access-logfile /opt/log/advanced_analytic_platform/access.log  \
--log-level=info \
--timeout 7200 \
--workers 2 \
--threads 4 &

Kill the application

ps -ef | grep "gunicorn"
kill -9 xxxx

Usage

Swagger document helps you to use API, with examples and test screens.

http://localhost:5001/api/

Dataset

Add Datasets first http://localhost:5001/api/db/dataset/

{
  "text": "sinyal problemi yaşıyorum",
  "model": "chatbot",
  "category": "teknik problem",
  "label": "negative"
}

Model

Create model with added datasets

POST: http://localhost:5001/api/classification/create_subjectivity_model/

{
  "model_name": "chatbot",
  "model_type": "chatbot"
}

Predict

Predict any text data with created models.

POST: http://localhost:5001/api/classification/predict/

{
  "text": "Uygulamada problemler oluştu. Hiç bağlanamadım.",
  "model_name": "chatbot"
}

Density

you can find the terms that are the most correlated with each of the feature of related dataset

POST: http://localhost:5001/api/density/get_density/

{
  "model_name": "string",
  "quantity": 0
}

NLP

Turkish is a agglutinative language so suffixes are deterministic features for phrase types; subject type; singularity or plurality; time and model type. It is hard to analyzing turkish sentences according to other languages, we used lemmatizaion process on each words. You can list preprocessing steps of this api as below:

Stop Words (nlp/understandable_text) --> correct noisy words (nlp/clean) -> lemmatize(nlp/lemma)

POST: http://localhost:5001/api/nlp/understandable_text/

{
     "text": "yine yayin kesildi yine magdur edildik"
}

POST: http://localhost:5001/api/nlp/clean/

{
     "text": "yine yayin kesildi yine magdur edildik"
}

POST: http://localhost:5001/api/nlp/lemma/

{
     "text": "yine yayin kesildi yine magdur edildik"
}

Development Tools

  • [Python] - Programing language
  • [SQLite] - SQL database engine
  • [Scikit-learn] - Python ML library
  • [Flask] - Python based web development microframework
  • [Swagger] - API development framework
  • [NLTK] - Language processing library
  • [Zemberek] - Language processing tool
  • [Gunicorn] - Python WSGI HTTP Server for UNIX