Beast code in Giters

Frank B.'s repositories

awesome-data-labeling

A curated list of awesome data labeling tools

000

Awesome-Table-Recognition

A curated list of resources dedicated to table recognition

000

BIG-bench-1

Beyond the Imitation Game collaborative benchmark for enormous language models

Language:Jupyter NotebookApache-2.0000

bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language:PythonBSD-3-Clause000

CRASS-data-set

The data for the CRASS-benchmark. See: https://www.crass.ai for further information.

Language:Jupyter NotebookApache-2.0000

crass.ai-big-bench-contribution

000

Slides to our HCII 2022 talk on "Putting users in the loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal". Imported from https://git.informatik.uni-leipzig.de/smarthec/doc-hcii2022-slides

010

docquery

An easy way to extract information from documents

Language:PythonMIT000

DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

Language:C#000

GastCluster

A set of bash scripts to spread number crunching jobs across several machines and collect the results back into a single file

Apache-2.0010

layout-parser

A Python Library for Document Layout Understanding

Language:PythonApache-2.0000

ocrd_segment

OCR-D-compliant page segmentation

Language:PythonMIT000

pdfix_sdk_example_cpp

Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...

000

pdfix_sdk_example_python

PDFix SDK samples for Python. PDF manipulation, content extraction, conversion , accessibility and more...

Language:Python000

PLIX

PLIX (Pipeline for Information Extraction) is a Python package and command line tool for information extraction from (PDF) documents.

Apache-2.0000

SciTSR

Table structure recognition dataset of the paper: Complicated Table Structure Recognition

MIT000

todo.md

TODO.md file format - todomd.org

000

frankiert