Extraction of semantic data from diagrams in scientific and other technical/business documents.
In many documents the diagrams are a key component of the information. Data are created in semantic form and output as machine readable files and then, kin one of the great barbarism of this century are trashed into bitmaps futher degraded by JPEG technology. This lost data leads to irreproducible science and in the worst cases people die. (Clinical trials are often published as PDF and data extraction is hard or near impossible.)
This project tackles the impossible - reconstituting semantic data for the world - "turning hamburgers into cows".
Among the subjects I have successfully extracted semantic data from:
- phylogenetic trees
- chemical structures and reactions
- study baseline data
- cyclic voltammograms
- forest plots
Many of these have common semantic diagrammatic abstractions and AMI builds these up using heuristics.
see PREPROCESS.md
`