dezoito / flask_Summarizer

A Dockerized Flask based API to summarize documents and articles (Visual Interface and tests included).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

flask_Summarizer

A very simple Flask based API to create a summary from documents and articles.

Updates

[Jan-2019] Improvements to the API, including auto-generated swagger documentation, courtesy of Flask RestPlus:

Navigating to the root URL let's you see the automatically generated documentation for all endpoints and test them on the fly.

You can also go to <root URL>/form to test the API using an HTML client form.

Dockerizing this app

You can use Docker to run the application in development or in production mode (assuming you have Docker and Docker Compose installed).

Development Mode

To build the app run in development mode just cd to the project root and run:

    ./rundev.sh

If everything runs smoothly, you will be able to open the app from http://localhost:5000.

This will run the app using Flask's built-in server on port 5000 and install testing dependencies (such as PhantomJS).

Production Mode

To build the app run it in production, cd to the project root and run:

    ./runprod.sh

This will install gunicorn, nginx and serv it on port 80

Note: If you want to run it on a virtualenv, you'll have to manually setup NLTK and install PhantomJS (see sh_scripts/install.sh to see how it's done in Ubuntu).

Running Tests

The functional tests still use PhantomJS's headless browser (I left them because some people might still find this useful).

From the tests/ directory, run:

    python3 -m unittest discover

You can also run individual test suites:

    python3 -m unittest tests_unit
    python3 -m unittest tests_views
    python3 -m unittest tests_api
    python3 -m unittest tests_phantomJS

References:

Flask Summarizer is based on the following projects:

Summarize.py: https://github.com/Rotten194/summarize.py

Python implementation of TextRank: http://www.davidadamojr.com/textrank-implementation-in-python-github-repo/

The Textrank algorithm seems to deliver a better summary, but was taking too long to run, so I just use a customized version of summary.py (I'm using Brazilian Portuguese Stopwords and Tokenizer).

About

A Dockerized Flask based API to summarize documents and articles (Visual Interface and tests included).


Languages

Language:Python 70.4%Language:HTML 19.5%Language:Shell 8.3%Language:Dockerfile 1.3%Language:CSS 0.6%