leonardomra / topic-modelling

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

topic-modelling

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

The API offers two endpoints, namely:

  • POST /topics
  • Data param: file: dataset.csv
  • Example of success response:
[{"title": "Suggested Topic", "terms": ["cell", "epithelial", "type", "epithelium", "airway", "human", "tissue", "cf", "ifn", "expression"]}, {"title": "Suggested Topic", "terms": ["patient", "hospital", "study", "group", "icu", "care", "\u00b1", "result", "day", "ed"]},...]
  • POST /count
  • Data param: file: dataset.csv
  • Example of success response:
{"count": {"analysis": 81773, "case": 83358, "include": 84623, "group": 85330, "human": 85863, "gene": 92091, ...}}

When running for the first time, it will be necessary to train the model. This can take quite some time. Because of the, the model can be stored and later on retrived, therefore avoiding retraining. For training set shouldUseDump to False in topicmodeller.py (this will be fixed later on).


The dataset can be either generated or acquired here GitHub. The link also provides the necessary documentation on the format and structure of the dataset. This API is based on this notebook. For more information on Topic Modelling with Latent Dirichlet Allocation, check this article!

About

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

License:Apache License 2.0


Languages

Language:Python 100.0%