mnamysl's starred repositories
data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
pmc-downloader
Batch downloads full-text PDFs (or any other soruces) from PubmedCentral OA articles
ICDAR2019_cTDaR
The ICDAR 2019 cTDaR is to evaluate the performance of methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables are provided. For TRACK B two subtracks exist: the first subtrack (B.1) provides the table region. Thus, only the table structure recognition must be performed. The second subtrack (B.2) provides no a-priori information. This means, the table region and table structure detection has to be done.
ctdar_measurement_tool
Evaluation Tool for the ICDAR 2019 Competition on Table Detection and Recognition
soft-gazetteers
Code and data for the paper "Soft Gazetteers for Low-resource Named Entity Recognition"