To compare the Summarized results of an Online Article using "Word Ranking Algorithm" and a function "summarize" of 'Gensim' Package.
Step -I:This is the "Extraction based summarization". Here, only the important sentences will be identified and they can be re-ordered to get the summary of the entire content. It's usage may be limited to some extent.
- Import packages such as - "Beautiful Soup (bs4), requests, regular expressions (re), Natural Language Tool Kit (nltk), heapq, gensim, Flask"
- Web Scraping is done to Extract the content from the web page using bs4, requests
- Scraping is achieved by understanding the style of the web page
- Text Cleaning of an Article is done by making use of (re) package
- To process the text of an article, we use (nltk) package there by partitioning the entire content into sentences and words
- Stop words removal is done making use of "stopwords" of (nltk)
- We calculate the frequency of each unique valid word within the entire content
- We calculate the word-frequency or word-score making use of ratio between unique word frequency and max frequency of word within the content
- We calculate the Sentence-score by summing up valid word-scores for every sentence
- All the sentences are sorted in descending order by the sentence-score using (heapq) package
- We can Obtain the Summary by joining the required number of sentences together
Web Development: We use the "Flask" to deploy the application which runs on localhost 5000.
Results: We can Compare the results of the two methods to understand the implementation of the Text based Summarization.
Packges used in this project are: