gnames / gnfinder

GNfinder finds scientific names in UTF8 texts, PDF files, MS Word/Excel documents, URLs etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lowercase input

git-arbitrarysystems opened this issue · comments

commented

This is an excellent tool. One issue i noticed though: The finder returns a result for "Canis lupus", but none for "canis lupus". Is it possible to circumvent this?

thank you @git-arbitrarysystems! Unfortunately it is not currently possible to find low-case scientific names. There are two reasons for this:

  1. False positives. Scientific names sometimes are the same as "normal" words. For example, such genera as 'America', 'Cafeteria', 'Cancer'. GNfinder tries to strike a balance between finding names and missing name-like words. By rules of nomenclature generic part of the name must be capitalized, and GNfinder follows this rule to avoid multi-fold increase in false positives.
  2. The speed of name-finding would decrease significantly as well. Now the app has to check every capitalized word. If it would try to check every word, it would affect the speed. Speed is important, as we use the app to traverse billions of pages.

Because of large number of false positives and decrease of performance, finding names with non-capitalized genera would not be a feature we want to implement. With advent of AI and machine learning we might be able to address this issue, but that would require a complete rethink/rewrite of the app.