miodeqqq / pdfs_performance

Testing Python PDF libraries and making conclusions...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python-PDF libraries performance tests

Checking performance with reading PDF and:

  • gathering info about the number of pages using python libraries.
  • ... some day ...

Current stable version: v1.0

Release date: 07.08.2019

Author:

Maciej Januszewski (maciek@mjanuszewski.pl)

Pre-requirements:

  • Firstly run Apache-Tika Server (for Tika purposes):
docker pull logicalspark/docker-tikaserver
docker run -d -p 9998:9998 logicalspark/docker-tikaserver

Sample PDFs data:

https://drive.google.com/open?id=1Xb99gWgynHO02e2YvAyX0dsnfUmWJwJD

Running:

./run.py /path/to/pdfs_data/ > /dev/null 2>&1 #disable prints

Sample plots outputs:

- Final statistics - overall processing time:

https://maciekj.pl/media/plots/pdfs_performance_final_stats_bar.html Scatter plot generated by plotly

- Final statistisc - bar chart:

https://maciekj.pl/media/plots/pdfs_performance_bar.html Boxes plot generated by plotly

About

Testing Python PDF libraries and making conclusions...


Languages

Language:Python 99.4%Language:Shell 0.6%