scyv / SCArchive

Tool for archiving documents and pictures

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SCArchive

Tool for archiving documents and pictures

  • SCArchive scans given local folders for different file types (PDF and HMTL by now, more are coming) and extracts meta data from each file.
  • PDF Files are OCR'd and extracted with the help from PDFBox (https://pdfbox.apache.org/) tesseract (https://github.com/tesseract-ocr/tesseract) and Graphicsmagick (http://www.graphicsmagick.org/).
  • The application uses Vaadin for providing a Web-UI where the user can search for and edit the gathered meta data.
  • As all files and also the gathered meta data is stored as local files, it is possible to synchronize the files via e.g. rsync or Resilio Sync to other machines.

Technology Stack

  • Java 8
  • Spring-Boot
  • Vaadin
  • PDFBox
  • tesseract
  • GraphicsMagick

Getting Started

  1. Install the prerequisites
  2. Currently only from source is possible
    • Clone this repository git clone git@github.com:scyv/SCArchive.git
    • Run mvnw package
    • Navigate to ./target: cd target
    • Copy application.properties from src/main/resources: cp src/main/resources/application.properties .
    • Edit application.properties for your needs (see below)
    • Run `java -jar server-0.0.1-SNAPSHOT.jar

Application.properties

Property key Possible Values Description
scarchive.documentpaths e.g. /home/user/myFiles;/home/user/myOtherFiles ; separated list of folders, the application shall scan
scarchive.scheduler.pollingInterval Integer e.g. 10 Time between two scans in Seconds
scarchive.tesseract.bin e.g. /usr/bin/tesseract Absolute path to the tesseract binary
scarchive.graphicsmagick.bin e.g. /usr/bin/gm Absolute path to the graphicsmagick binary
scarchive.openlocal true or false When true, the files are opened locally, when false, the files are downloaded
scarchive.enablescan true or false When true, scanning of files is enabled, when false, no scanning takes place. This is especially useful if you want to provide the web ui without letting the host do the scanning
scarchive.maxfindings e.g. 100 Maximum amount of findings that shall be shown when searching for meta data

About

Tool for archiving documents and pictures

License:Apache License 2.0


Languages

Language:Java 85.5%Language:Shell 7.5%Language:Batchfile 5.8%Language:CSS 1.2%