Jakob Kofoed Janot's repositories
ConceptualSearch
Train a Word2Vec model and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
deskew
Library used to deskew a scanned document
DIVAServices
Repository of the back end implementation of DivaServices
image_text_reader
The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.
ldspider
A crawler for the Linked Data web
libxml-ruby
Libxml bindings for Ruby.
multimarkdown-ffi
A Multimarkdown wrapper for Ruby
NER-BERT-pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
ocr-conversion
Conversions between various OCR formats
ocr_testing
Scripts and results from our OCR roundup, available on Source
presentations
Presentations
prima-page-converter
Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as well as ALTO XML, FineReader XML, and HOCR
PRLib
Pre-Recognize Library - library with algorithms for improving OCR quality.
ruby-chardet
Charset detector. A Ruby clone of Mozilla's chardet
saxon
Ruby wrapper for Saxon
servlex
Servlex, an implementation of the EXPath Webapp framework
SolrPlugins
Dice Solr Plugins from Simon Hughes Dice.com
stanford-core-nlp-jruby
A jruby wrapper of the Stanford Core NLP package
style-guides-presentation
In Defence of Style Guides, presentation for Balisage 2018
tesseract-recognize
Tool for doing layout analysis and OCR using tesseract in Page XML format
XSDtoRNG
XSL stylesheet for XML Schema (XSD) to Relax NG (RNG) conversion.