Text-Summarization-with-Flask

To compare the Summarized results of an Online Article using "Word Ranking Algorithm" and a function "summarize" of 'Gensim' Package.

Step -I:This is the "Extraction based summarization". Here, only the important sentences will be identified and they can be re-ordered to get the summary of the entire content. It's usage may be limited to some extent.

Import packages such as - "Beautiful Soup (bs4), requests, regular expressions (re), Natural Language Tool Kit (nltk), heapq, gensim, Flask"
Web Scraping is done to Extract the content from the web page using bs4, requests
Scraping is achieved by understanding the style of the web page
Text Cleaning of an Article is done by making use of (re) package
To process the text of an article, we use (nltk) package there by partitioning the entire content into sentences and words
Stop words removal is done making use of "stopwords" of (nltk)
We calculate the frequency of each unique valid word within the entire content
We calculate the word-frequency or word-score making use of ratio between unique word frequency and max frequency of word within the content
We calculate the Sentence-score by summing up valid word-scores for every sentence
All the sentences are sorted in descending order by the sentence-score using (heapq) package
We can Obtain the Summary by joining the required number of sentences together

Step -II: Here, we use the "summarize" function of the gensim package { First 6 steps are same as that of Step -I} Now, the processed content is provided as parameter to the summarize function. We can obtain the summary based on the ratio that is to be provided as the parameter to the summarize function.

Web Development: We use the "Flask" to deploy the application which runs on localhost 5000.

Results: We can Compare the results of the two methods to understand the implementation of the Text based Summarization.

Packges used in this project are:

About

GNU General Public License v3.0

Languages

Language:CSS 60.5%Language:Python 25.0%Language:HTML 14.6%