Frank B.'s repositories
awesome-data-labeling
A curated list of awesome data labeling tools
Awesome-Table-Recognition
A curated list of resources dedicated to table recognition
BIG-bench-1
Beyond the Imitation Game collaborative benchmark for enormous language models
bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
CRASS-data-set
The data for the CRASS-benchmark. See: https://www.crass.ai for further information.
doc-hcii2022-slides
Slides to our HCII 2022 talk on "Putting users in the loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal". Imported from https://git.informatik.uni-leipzig.de/smarthec/doc-hcii2022-slides
docquery
An easy way to extract information from documents
DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
GastCluster
A set of bash scripts to spread number crunching jobs across several machines and collect the results back into a single file
layout-parser
A Python Library for Document Layout Understanding
ocrd_segment
OCR-D-compliant page segmentation
pdfix_sdk_example_cpp
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
pdfix_sdk_example_python
PDFix SDK samples for Python. PDF manipulation, content extraction, conversion , accessibility and more...
PLIX
PLIX (Pipeline for Information Extraction) is a Python package and command line tool for information extraction from (PDF) documents.
SciTSR
Table structure recognition dataset of the paper: Complicated Table Structure Recognition
todo.md
TODO.md file format - todomd.org