openDiagram

Extraction of semantic data from diagrams in scientific and other technical/business documents.

Overview

In many documents the diagrams are a key component of the information. Data are created in semantic form and output as machine readable files and then, kin one of the great barbarism of this century are trashed into bitmaps futher degraded by JPEG technology. This lost data leads to irreproducible science and in the worst cases people die. (Clinical trials are often published as PDF and data extraction is hard or near impossible.)

This project tackles the impossible - reconstituting semantic data for the world - "turning hamburgers into cows".

Among the subjects I have successfully extracted semantic data from:

phylogenetic trees
chemical structures and reactions
study baseline data
cyclic voltammograms
forest plots

Many of these have common semantic diagrammatic abstractions and AMI builds these up using heuristics.

preprocessing with ami

see PREPROCESS.md

creation of project

About

Extaction of semantic data from diagrams in scientific and other technical/business documents

Apache License 2.0

Languages

Language:Jupyter Notebook 57.2%Language:HTML 42.4%Language:Python 0.4%