You can easily build custom turkish text classifiers through this API for more accurate insights, and start detecting topics, sentiment, intent, and more
Coded in 2019
To download application the following commands.
git clone https://github.com/osmanuygar/turkish-text-classification-api.git
cd turkish-text-classification-api
Create a virtual Python environment in a directory named venv, activate the virtualenv and install required dependencies using pip.
cd <related-path>/turkish-text-classification-api/
virtualenv -p `which python3` venv
source venv/bin/activate
pip install -r requirements.txt
Install packages
cd <related-path>/turkish-text-classification-api/
python setup.py develop
Activate the virtual enviroment Start the application with Gunicorn (you can remove workers and threads in gunicorn, if you want)
source <related-virtualenv-path>/venv/bin/activate
cd <related-path>/turkish-text-classification-api/sentiment/
gunicorn \
--bind 0.0.0.0:5001 app:app \
--log-file /opt/log/advanced_analytic_platform.log \
--error-logfile /opt/log/advanced_analytic_platform/error.log \
--access-logfile /opt/log/advanced_analytic_platform/access.log \
--log-level=info \
--timeout 7200 \
--workers 2 \
--threads 4 &
Kill the application
ps -ef | grep "gunicorn"
kill -9 xxxx
Swagger document helps you to use API, with examples and test screens.
Add Datasets first http://localhost:5001/api/db/dataset/
{
"text": "sinyal problemi yaşıyorum",
"model": "chatbot",
"category": "teknik problem",
"label": "negative"
}
Create model with added datasets
POST: http://localhost:5001/api/classification/create_subjectivity_model/
{
"model_name": "chatbot",
"model_type": "chatbot"
}
Predict any text data with created models.
POST: http://localhost:5001/api/classification/predict/
{
"text": "Uygulamada problemler oluştu. Hiç bağlanamadım.",
"model_name": "chatbot"
}
you can find the terms that are the most correlated with each of the feature of related dataset
POST: http://localhost:5001/api/density/get_density/
{
"model_name": "string",
"quantity": 0
}
Turkish is a agglutinative language so suffixes are deterministic features for phrase types; subject type; singularity or plurality; time and model type. It is hard to analyzing turkish sentences according to other languages, we used lemmatizaion process on each words. You can list preprocessing steps of this api as below:
Stop Words (nlp/understandable_text) --> correct noisy words (nlp/clean) -> lemmatize(nlp/lemma)
POST: http://localhost:5001/api/nlp/understandable_text/
{
"text": "yine yayin kesildi yine magdur edildik"
}
POST: http://localhost:5001/api/nlp/clean/
{
"text": "yine yayin kesildi yine magdur edildik"
}
POST: http://localhost:5001/api/nlp/lemma/
{
"text": "yine yayin kesildi yine magdur edildik"
}
- [Python] - Programing language
- [SQLite] - SQL database engine
- [Scikit-learn] - Python ML library
- [Flask] - Python based web development microframework
- [Swagger] - API development framework
- [NLTK] - Language processing library
- [Zemberek] - Language processing tool
- [Gunicorn] - Python WSGI HTTP Server for UNIX