petermr / openDiagram

Extaction of semantic data from diagrams in scientific and other technical/business documents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

openDiagram

Extraction of semantic data from diagrams in scientific and other technical/business documents.

Overview

In many documents the diagrams are a key component of the information. Data are created in semantic form and output as machine readable files and then, kin one of the great barbarism of this century are trashed into bitmaps futher degraded by JPEG technology. This lost data leads to irreproducible science and in the worst cases people die. (Clinical trials are often published as PDF and data extraction is hard or near impossible.)

This project tackles the impossible - reconstituting semantic data for the world - "turning hamburgers into cows".

Among the subjects I have successfully extracted semantic data from:

  • phylogenetic trees
  • chemical structures and reactions
  • study baseline data
  • cyclic voltammograms
  • forest plots

Many of these have common semantic diagrammatic abstractions and AMI builds these up using heuristics.

preprocessing with ami

see PREPROCESS.md

creation of project

`


About

Extaction of semantic data from diagrams in scientific and other technical/business documents

License:Apache License 2.0


Languages

Language:HTML 51.0%Language:Jupyter Notebook 48.2%Language:Python 0.8%Language:XSLT 0.0%