rdmpage / material-examined

Linking specimen codes to identifiers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Material examined

Experiments in “resolving” specimen codes to identifiers, with a view to linking specimens across GBIF, GenBank, BOLD, and BioStor. For background see Some design notes on modelling links between specimens and other kinds of data, Linking GBIF and Genbank and GBIF specimens in BioStor: who are the top ten museums with citable specimens?. See also:

Guralnick, R. P., Cellinese, N., Deck, J., Pyle, R. L., Kunze, J., Penev, L., … Page, R. (2015, April 6). Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data. ZooKeys. Pensoft Publishers. http://doi.org/10.3897/zookeys.494.9352

Guralnick, R., Conlin, T., Deck, J., Stucky, B. J., & Cellinese, N. (2014, December 3). The Trouble with Triplets in Biodiversity Informatics: A Data-Driven Case against Current Identifier Practices. (D. P. Little, Ed.)PLoS ONE. Public Library of Science (PLoS). http://doi.org/10.1371/journal.pone.0114069

Related projects include sharifX/pid-specimen-genbank and AgentschapPlantentuinMeise/linkedTaxonomy.

Live demo

There is a live demo at https://material-examined.herokuapp.com. To use simply paste in a specimen code. Some examples include:

  • CAS:207283
  • MNHN 2003-1054
  • MCZ 24351
  • BMNH 1891.6.13.25
  • KU 3581

Acknowledgements

Code uses the GBIF API, images of higher taxa from http://phylopic.org

Notes on matching codes

  • BMNH seems to have "lots" in the catalogue for some records

  • BMNH 2005.8.9.46 (GenBank EF462257) is 2005.8.9.46-48 in NHM portal (found by searching on taxon Altolamprologus compressiceps)

  • NVMD74079 (GenBank HQ699006) is a typo in GenBank for NMVD74079

  • QM J59361 Matched to specimen cited by http://biostor.org/reference/105267

  • BMNH 3.2.8.15 matched http://gbif.org/occurrence/1056665719 HOLOTYPE, but not of Lemniscomys barbarus but Arvicanthis dunni (described by Thomas 1903 http://biostor.org/reference/145601 ) which also cites the specimen as “B.M. No. 3.2.8.15”

  • BMNH 2005.8.9.105 is BMNH 2005.8.9.105-106 in portal (EF462248)

  • HUMZ 127670 cited by 10.11369/jji1950.42.39 [japanlinkcenter] http://bionames.org/references/8573d38fb5f77a72a7d05b71c457c643

  • BMNH 1886.7.8.1742 sequence AF317712, holotype, sequence not linked in GenBank, paper is 10.1046/j.1474-919X.2002.00036.x, see also

  • USNM 608672 not found by extending (too many), so will need to find fromCouchDB database….

  • UF 121176 = EF185121 (type, georef in GBIF, not GenBank, GenBank hasn’t got publication updated)

  • UF:UF10868 needs extend 30 to find, matched to occurrence/899580307 (Raoulserenea oxyrhyncha), genBank GQ260977 Raoulserenea komaii “Reef-associated crustacean fauna: biodiversity estimates using semi-quantitative sampling and DNA barcoding”

— will need CouchDB rules to generate matching codes…

Example of type and taxa confusion

USNM 124888 (USNM 124888.7246730) is type of Mus clabatus http://collections.si.edu/search/results.htm?q=record_ID:nmnhvz_7246730 doi:10.5479/si.00963801.31-1498.575 http://biostor.org/reference/79057 http://biodiversitylibrary.org/page/7737758

KUT 5498 KU tissue, GBIF has three sequences, BLAST search brings up larval fish study

Hard to match sequence voucher

TRING 1877111743 => BMNH 1877.11.17.43 (voucher for KF281084)

NHM has as Myiagra oceania erythrops, which GBIF matches to “Myiagra oceania” KF281084 is “Myiagra erythrops”

Type examples

To search for types in GBIF http://api.gbif.org/v1/occurrence/search?scientificName=Reptilia&typeStatus=HOLOTYPE

USNM 534311

(bee, fuzzy match but not sure why as got name correct) See also http://collections.si.edu/search/record/nmnhentomology_9170726 note that GBIF has USNM 534311.9170726, so “.9170726” suffix is local URL id number (handy to know)

Local record also has “USNM Type Number : 12238”, which is how Cockerell refers to it http://bionames.org/references/ae4ba3c4e328ad798469c8aa5d27089c p. 416 doi:10.5479/si.00963801.36-1674.411

basionym Mesotrichia abbotti (GBIF doesn’t have this)

BMNH 1897.5.13.441

GBIF lists http://www.gbif.org/species/5789258 as holotype of Dicaeum haematostictum Sharpe, 1876, with with fuzzy match verbatim has scientific name Dicaeum haematostictum whiteheadi Hachisuka, 1926, which is described as Dicæum hæmatostictum whiteheadi p. 55 in http://biostor.org/reference/145719 (http://biodiversitylibrary.org/page/40499064)

Type in British Museum ... Registered No. 1897.5.13.441.

UMMZ 24847

GBIF has as holotype of Dicaeum trigonostigma, but http://www.lsa.umich.edu/ummz/birds/collections/result.asp?textfield=Dicaeum&Submit=Search says holotype of Dicaeum dorsale

BMNH 1886.7.8.1742

BMNH 1886.7.8.1742 sequence AF317712, holotype, sequence not linked in GenBank, paper is 10.1046/j.1474-919X.2002.00036.x

NHM has as type of Acrocephalus macrorhynchus, See paper 10.1046/j.1474-919X.2002.00036.x :

“This specimen (BMNH registration no. 1886.7.8.1742) was collected on 13 November 1867 in the Sutlej Valley near Rampoor (31°26′N, 77°37′E), Himachal Pradesh, by Allan Hume (Hume 1869). It remained in his collection until 1885 when this came in its entirety to the British Museum (BMNH). The specimen was first provisionally described as Phyllopneuste macrorhyncha (Hume 1869) but the name was changed two years later to Acrocephalus macrorhynchus Hume, 1871 when its generic affinity was established. However, Oberholser (1905) pointed out that this latter name was untenable because a specimen from Egypt, described by von Müller in 1853 as Calamoherpe macrorhyncha, appeared to be a synonym of Clamorous Reed Warbler Acrocephalus stentoreus. Hence, Acrocephalus macrorhynchus was abandoned in favour of the new name Acrocephalus orinus Oberholser, 1905.”

Hume 1871 is 10.1111/j.1474-919x.1871.tb05822.x (Biostor 145727) Hume 1869 is BioStor 145729

Oberholser is http://bionames.org/references/f6c016797010850e8631a037953fae2f

USNM 88378

Type of Dasyatis americana (has image)

Mentioned in http://www.gbif.org/species/131870491 (description from Plaza), dataset http://www.gbif.org/dataset/6753c178-d210-4076-bb3a-1fd1739f3120 (doi:10.15468/rzbk7s a GBIF DOI) “A new species of whiptail stingray of the genus Dasyatis Rafinesque, 1810 from the Southwestern Atlantic Ocean (Chondrichthyes: Myliobatiformes: Dasyatidae).”

I guess we could find this by full text search of Plaza DWCA, or just mine text of paper directly….

BMNH 28.5.3.1

GBIF matches higher taxon, doesn’t know about Talpa klossi (described by Thomas 1929 http://bionames.org/references/10.1080/00222932908672961 doi:10.1080/00222932908672961 )

USNM 631622

Type of Stiphrornis pyrrholaemus http://bionames.org/references/c6935643d03909f7db975df06faba49f (Zootaxa), specimen is sequenced http://www.ncbi.nlm.nih.gov/nuccore/JQ176287 (seq is known to GBIF, but GBIF doesn’t match name properly, sequence cited by Smithsonian barcoding project). Sequence also in BOLD http://www.boldsystems.org/connectivity/specimenlookup.php?processid=USNMJ331-11.COI-5P

ANSP 159261

Discussed in “DNA from a 100-year-old holotype confirms the validity of a potentially extinct hummingbird species” DOI: 10.1098/rsbl.2009.0545

BMNH 1920.6.26.42

GBIF maps to Spizaetus bartelsi Stresemann, 1924 but is Spizaetus batesi W. L. Sclater, 1919 (description http://biostor.org/reference/145776, see also http://www.zoonomen.net/cit/RI/SP/Sitt/sitt00626a.jpg ) Sclater then synonymises this with Limnaëtus africanus (= Spizaetus africanus) see “Remarks on Spizaëtus batesi” http://biostor.org/reference/145777

BMNH 8.4.3.73

GBIF maps to Aethomys chrysophilus (de Winton, 1897) whereas is holotype for Mus chrysophilus ineptus Thomas and Wroughton 1908, see http://bionames.org/references/10.1515/mamm.1998.62.3.427

CNMA 22439

GBIF has http://gbif.org/occurrence/370555719 identified as Peromyscus simulus (by Vargas Cuenca J 2004-7-15), but is holotype of Habromys delicatulus described in 2002 see http://biostor.org/reference/81388

Note that Peromyscus simulus has been placed in Habromys. Perhaps museum labels haven’t been updated?

Sequence data relevant to this taxon http://dx.doi.org/10.1016/j.ympev.2006.08.019

KU 161003

Holotype, GBIF matches to family “Cricetidae” whereas verbatim has “Tanyuromys aphrastus” http://www.gbif.org/occurrence/686488116/verbatim Tanyuromys is a genus described in 2012, http://bionames.org/references/376a83d5159fee72deb3f89e16b61c1b, the type species is Oryzomys aphrastus Harris, 1932

So GBIF doesn’t have this combination, and hence can’t match the name.

BMNH 1922.12.18.54

Holotype of Phascogale flavipes adusta Thomas, 1923 (doi:10.1080/00222932308632835) http://bionames.org/references/40b0f12c6d6879dcd3b2944c592d4d80 In GBIF is matched to Phascogale flavipes Le Souef & Burrell, 1926, whereas is currently Antechinus adustus (see, e.g. http://bionames.org/references/2391a7644dc28bdb6fb2f2a90dcbd625 )

So, GBIF matches to wrong taxon…

BMNH 1955.42.65

Search misses Serinus atrogularis seshekeensis http://www.gbif.org/occurrence/1057254282 http://data.nhm.ac.uk/specimen/6987b38e-ea8b-45ae-90ac-807db1d16c51 which has catalog number “1955.42.65. (Collector`s no. M56.)” (sigh)

RMCA A.78820

Currently (2015-08-30) exists in three copies:

http://gbif.org/occurrence/1024442512 http://gbif.org/occurrence/665465164 http://gbif.org/occurrence/317834695 (URN:catalog:RMCA:Aves:RMCA A.78820)

NHMUK 1870.10.26.176

http://gbif.org/occurrence/1057410476 http://data.nhm.ac.uk/specimen/492908bd-7e8f-433d-a1b9-067ca5ae8fd4

Bulimus angasianus Pfeiffer, 1864

Cited in http://dx.doi.org/10.3897/zookeys.194.2721

WAM:T63108

GBIF http://gbif.org/occurrence/500819095 OZCAM 1ef3cc48-c2ea-41bf-ab95-0bd6dbef748c GenBank https://www.ncbi.nlm.nih.gov/nuccore/JF749966

About

Linking specimen codes to identifiers


Languages

Language:PHP 89.6%Language:HTML 10.4%