kennethjones17 / PDFparser_TextpropertyExtractor

A basic E-PDF parser that extracts all the Text Properties. Those include the Text, Text Font, Text Style, Text Size, Text Color. The parser performs also performs Data pre-processing by removing stopwords and punctuation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDFparser_TextpropertyExtractor

A basic E-PDF parser that extracts all the Text Properties. Those include the Text, Text Font, Text Style, Text Size, Text Color. The parser performs also performs Data pre-processing by removing stopwords and punctuation.

How to Run code

Download the file and pass the document to be parsed and make sure the document is in the same file explorer.

About

A basic E-PDF parser that extracts all the Text Properties. Those include the Text, Text Font, Text Style, Text Size, Text Color. The parser performs also performs Data pre-processing by removing stopwords and punctuation.


Languages

Language:Python 100.0%