ContentMine / phylotree

A repository for ami-phylotree development

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

phylotree

ami-phylo analyses images and diagrams to extract phylogenetic trees. This is a complete repository of the analysis of ca 4300 figure image files from the IJSEM journal, carried out as Open Notebook Science. The intention is that everything in the analysis is either accessible here or should be Open and linked from here.

Main headings are:

description of the workflow

  • Scrape figure image content from IJSEM journal website (note: was originally performed on older Highwire platform, not new Ingenta platform)
  • Manually filter out non-phylogeny containing figures using Shotwell.
  • Pass each of these figures to our software for analysis with this bash loop:
#!/bin/bash
while read i ; 
      do timeout 60s mvn exec:java  -Dexec.mainClass='org.xmlcml.ami2.plugins.phylotree.RunPhylo' \
      -Dexec.args=''"$i"' ./all-output/'"$i"'' -e -X | tee $i.log ; 
done <list-of-input-images.txt
  • check results for OCR errors and Newick structure errors
  • Standardise taxa across different studies
  • Feed cleaned Newick data to mrpmatrix to create a supertree matrix
  • Analyse supertree matrix with TNT

specification of files, errors, protocols

Figure images were obtained from IJSEM articles from 2003 to 2014 (inclusive). This includes 4705 articles. 4341 figures containing a dendrogram were extracted from this set of articles.

input and output files (large)

errors

About

A repository for ami-phylotree development


Languages

Language:TeX 56.7%Language:HTML 37.8%Language:Shell 5.5%