aws-samples / textract-paragraph-identification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

textract-paragraph-identification

This repository contains some examples on how to extract insights from the Textract output. We have showed examples to extract headers, paragraphs and footers based on the font size, indentation and paragraph endings and line separators.

***Note: This is not a solution for all the types of paragraphs/text segments. ***

Here we took examples of some of the files we have worked with, and this only gives guidance on how to use metadata provided by Textract.

Also, once the segments were identified we are using Amazon Comprehend to get sentiment.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

License:MIT No Attribution


Languages

Language:Python 100.0%