Checking performance with reading PDF and:
- gathering info about the number of pages using python libraries.
- ... some day ...
Current stable version: v1.0
Release date: 07.08.2019
Maciej Januszewski (maciek@mjanuszewski.pl)
- Firstly run Apache-Tika Server (for Tika purposes):
docker pull logicalspark/docker-tikaserver
docker run -d -p 9998:9998 logicalspark/docker-tikaserver
https://drive.google.com/open?id=1Xb99gWgynHO02e2YvAyX0dsnfUmWJwJD
./run.py /path/to/pdfs_data/ > /dev/null 2>&1 #disable prints
- Final statistics - overall processing time:
https://maciekj.pl/media/plots/pdfs_performance_final_stats_bar.html
- Final statistisc - bar chart: