LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.
Please feel free to visit the tool homepage and the web application. A short user manual is available here.
For this guide tomcat version 7 is used.
apt-get install tomcat7
apt-get install maven
apt-get install openjdk-8-jdk
git clone https://github.com/chreul/LAREX.git
run mvn clean install -f LAREX/Larex/pom.xml
.
Either: sudo ln -s $PWD/LAREX/target/Larex.war /var/lib/tomcat7/webapps/Larex.war
or cp LAREX/target/Larex.war /var/lib/tomcat7/webapps/Larex.war
systemctl start tomcat7
to restart systemctl restart tomcat7
to start automatically at system boot systemctl enable tomcat7
It is recommended to use Eclipse.
In Eclipse go to Help -> Install New Software -> Work with neon -> Install Web, XML, Java EE and OSGi Enterprise Development
Install Maven as seen above and build the project.
Download the most recent version under http://tomcat.apache.org/download-90.cgi.
Select the web perspective and add the Tomcat server.
Install homebrew (see https://brew.sh/).
Afterwards install all required packages (java, Tomcat, git, and maven):
brew cask install java
brew install tomcat git maven
To verify the Tomcat installation use homebrew’s services utility. Tomcat should now be listed here:
brew services list
Run in your desired project directory
git clone https://github.com/chreul/LAREX.git
to clone the repository.
run mvn clean install -f LAREX/Larex/pom.xml
.
Either: sudo ln -s $PWD/LAREX/target/Larex.war /usr/local/Cellar/tomcat/[version]/libexec/webapps/Larex.war
or cp LAREX/target/Larex.war /usr/local/Cellar/tomcat/[version]/libexec/webapps/Larex.war
brew services start tomcat
to restart brew services restart tomcat
Go to localhost:8080/Larex
.
You can add your own books by copying them to src/webapp/resources/books
(Or an alternative direction set in the config file. See Configuration for more information).
Larex contains a configuration file (src/webapp/WEB-INF/larex.config) with a few settings that can be set before running the application.
The setting bookpath sets the file path of the books folder.
e.g. bookpath:/home/user/books (Linux)
e.g. bookpath:C:\Users\user\Documents\books (Windows)
Larex will load the books from this folder.
[default /src/main/webapp/resources/books]
The setting localsave tells the application how to handle results locally when saved.
<mode>=[bookpath|savedir|none]
bookpath: save the result in the bookpath
savedir: save the result in a defined savedir
none: do not save the result locally [default]
e.g. localsave:bookpath
The setting savedir is needed if localsave mode is set to "savedir".
e.g. savedir:/home/user/save (Linux)
e.g. savedir:C:\Users\user\Documents\save (Windows)
The setting websave tells the application how to handle results on the browser side when saved.
<value>=[true|false]
true: download the result after saving [default]
false: no action after saving
e.g. websave:true
Set the accessible modes in the Larex gui <value>=[[segment][edit][lines][text]]
A combination of the modes "segment", "edit", "lines" and "text" can be set as
a space separated string.
e.g. modes:segment lines
The order of those modes in the string also determines which mode is opened on startup, with the first in the list being opened as main mode. The mode "segment" can be replaced with "edit" in order to hide all auto segmentation features. ("edit" will be ignored if both are present)
[Default]modes:segment lines text
This setting enables or disables the direct open feature.
<value>=[enable|disable]
This feature allows users to load a book from everywhere on the servers drive aswell as to alter the options websave, localsave and savedir.
enable: enable direct request
disable: disable direct request [default]
e.g. directrequest:enable
This feature should be used with caution but is very useful when using Larex in a workflow with other web applications. (e.g. in docker)
The easiest direct request would be via a html form with the values bookpath, bookname, websave (optional), localsave (optional) and savedir (optional).
<form action="http://localhost:8080/Larex/direct" method="POST">
bookpath: <input type="text" name="bookpath"/><br>
bookname: <input type="text" name="bookname"/><br>
websave: <input type="text" name="websave"/><br>
localsave: <input type="text" name="localsave"/><br>
savedir: <input type="text" name="savedir"/><br>
modes: <input type="text" name="modes"/><br>
<input type="submit"/>
</form>
Reul, Christian; Springmann, Uwe; Puppe, Frank: LAREX – A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books. In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage (2017). ACM. Draft available at arXiv.
Reul, Christian; Dittrich, Marco; Gruner, Martin: Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488). In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage (2017). ACM. Draft available at arXiv.