prak112 / document-wordcloud

Insights of commonly used words in the document represented through a word cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Approach

Extraction of commonly used terms in the GYBN policy briefs

  • Use PyPDF2 library to extract text from .pdf file
  • Create a dictionary with counter for each identified word
  • Filter common stop-words (based on context)

Evaluation of the terms to identify meaning

  • Load data into Dataframe
  • Visualize wordcloud from Dataframe, learnt from this Datacamp tutorial

Output

Most commonly occuring words in the document

About

Insights of commonly used words in the document represented through a word cloud

License:MIT License


Languages

Language:Jupyter Notebook 100.0%