Paperless
Just a small script collection to work with scanned documents. More details about the script can be found in the blog (German): https://write.tchncs.de/~/Paperless
blank3.py
Remove blank pages from all scanned PDFs in the folder
count.py
Count the pages of all PDF files in the directory and all subdirectories
repair.py
Verify if the PDF is a valid PDF/A and if not process them with OCRmyPDF to get a valid PDF/A
verify.py
Verify all PDFs in the folder with veraPDF
setdate.py
Rename the PDF with the date from the content (e.g. Test.pdf --> 2020-09-05 Test.pdf)
requires: pdfminer
pip install pdfminer