darwin403 / artist-cv-parser

A web application that identifies all Artist's Exhibition Titles from a given CV. The text detection is handled by AWS Textract and title detection is handled by AWS Comprehend.

Home Page:http://artbiogs.herokuapp.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Artbiogs - Artist CV Parser

A web application that uses AI/ML to extract all Artist's Exhibition details from a CV.

Deploy

Deploy

Development

You require python3, google-chrome and wkhtmltopdf installed. First clone the repository and change directory to the project folder. Then create a virtual environment:

python3 -m virtualenv .venv
source .venv/bin/activate

Now, run the following:

python setup.py develop

This installs all the necessary dependencies to run the python project. Now, lets launch the web application by running:

python web/app.py

You can now view the application at http://localhost:5000

Technologies

Web Stack:

  • Python - Stack core
  • Flask + SocketIO - Web Server
  • Bulma - CSS Framework
  • Selenium - Remote Webpage to PDF conversion
  • wkhtmltopdf - HTML to PDF generation.

AI/ML Cloud Technologies used:

  • AWS S3 - File Storage
  • AWS Textract - Text extraction
  • AWS Comprehend - Exhibition Title

About

A web application that identifies all Artist's Exhibition Titles from a given CV. The text detection is handled by AWS Textract and title detection is handled by AWS Comprehend.

http://artbiogs.herokuapp.com/


Languages

Language:Python 65.4%Language:HTML 34.1%Language:Shell 0.5%