Find a way to get a list of IUCn species

Question

Find a way to get a list of IUCn species

markmacgillivray opened this issue 9 years ago · comments

Where is this list? @blahah probably knows.

If there is an API we can query for species that we pull each day, that would be good. If not, a way to get a dump of it and keep it up to date. Or a way to scrape it off a web page somewhere. Whichever approach, a python script that can be called as an exec by canary would be good.

Anusha Ranganathan · Answer 1 · Wed Oct 21 2015 17:32:24 GMT+0800 (China Standard Time)

Good question. I've asked in Slack.

Rik · Answer 2 · Thu Oct 22 2015 00:35:10 GMT+0800 (China Standard Time)

I have added a full export CSV and XML to the slack #development channel

Anusha Ranganathan · Answer 3 · Wed Oct 28 2015 17:16:50 GMT+0800 (China Standard Time)

From the IUCN search results, there should be 89,586 results. The export data has the following columns.

Species ID
Kingdom
Phylum
Class
Order
Family
Genus
Species
Authority
Infraspecific rank
Infraspecific name
Infraspecific authority
Stock/subpopulation
Synonyms
Common names (Eng)
Common names (Fre)
Common names (Spa)
Red List status
Red List criteria
Red List criteria version
Year assessed
Population trend
Petitioned

The red list status page lists the number of IUCN species as 6,260

The red list categories are (link to IUCN document)

Category	Code
Not Evaluated	NE
Data Deficient	DD
Least Concern	LC
Lower Risk	LR
Near Threatened	NT
Vulnerable	VU
Endangered	EN
Critically Endangered	CR
Extinct in the Wild	EW
Extinct	EX

There is an API which should make getting the complete list of species easier - http://rlapiv3-beta.iucnredlist.org/api/v3/docs.

At present, the API seems to be down (502 Bad gateway errors).

Anusha Ranganathan · Answer 4 · Wed Oct 28 2015 17:24:56 GMT+0800 (China Standard Time)

How best should this data be made available to Canary?

markmacgillivray · Answer 5 · Wed Oct 28 2015 18:53:16 GMT+0800 (China Standard Time)

If we have a way of retrieving the data programmatically that would be nice. If it is a case of a manual download that is fine too, in which case just write up how we get the data and how often we would have to manually download to have the latest data. Whichever way we get the data, ideally it would be loaded into an elasticsearch index - so, the code should iterate every record found in IUCn data and send them to an es index address such as http://localhost:9200/contentmine/iucn. Each record should have a UUID if there is not a unique ID provided in the IUCn data. The method of updating the indexed data will depend on how we can retrieve data from IUCn - either update specific records by their IUCn ID, or blow away the whole index and rebuild each time. There is a pretty useful generic mapping.json file that should be used whenever an index type is created, it is at http://static.cottagelabs.com/mapping.json - however if upon looking at the data you find that there would be benefit in a custom mapping then of course just make one and include it with the code.

Anusha Ranganathan · Answer 6 · Wed Oct 28 2015 19:02:48 GMT+0800 (China Standard Time)

For now I plan to retrieve the information using their API. I didn't find any way of getting updates using the API. I will index it in ES and send you a link to have a look at once I have something. We could periodically update the index. They do have identifiers for each of the species, but somewhere in their website I read that they are not persistent and we shouldn't use them as such. So we will create our own uuid.

markmacgillivray · Answer 7 · Wed Oct 28 2015 19:23:30 GMT+0800 (China Standard Time)

OK yes, create our own UUID for the records and store their species ID as
informational. RSU commented saying he provided a link to a data dump on
slack - did he get that via the API or does he have another method? The API
method should work fine, it is just annoying it is flaky, but still just
write some catches for their downtime and it would be OK.

On Wed, Oct 28, 2015 at 11:02 AM, Anusha Ranganathan <
notifications@github.com> wrote:

For now I plan to retrieve the information using their API. I didn't find
any way of getting updates using the API. I will index it in ES and send
you a link to have a look at once I have something. We could periodically
update the index. They do have identifiers for each of the species, but
somewhere in their website I read that they are not persistent and we
shouldn't use them as such. So we will create our own uuid.

—
Reply to this email directly or view it on GitHub
#6 (comment).

Anusha Ranganathan · Answer 8 · Wed Oct 28 2015 20:11:58 GMT+0800 (China Standard Time)

RSU downloaded the search results (linked above - IUCN search results). He thought he had downloaded the full list, but it's just a small result set (480 rows). I think he also mentioned retrying in Slack and that the downloads timing out. That has been my experience too. The API doc does state that this is expected behaviour and to use the API for this. If the API continues to be down, that is the route I will be forced to take. It's a manual laborious process and one I will do my best to avoid.

markmacgillivray · Answer 9 · Wed Oct 28 2015 20:20:42 GMT+0800 (China Standard Time)

OK. Probably you could automate against the search results and just trick
it into thinking you are human, if that is necessary :)
On 28 Oct 2015 12:11, "Anusha Ranganathan" notifications@github.com wrote:

RSU downloaded the search results (linked above - IUCN search results
http://www.iucnredlist.org/search/link/5627b7b0-218891a4). He thought
he had downloaded the full list, but it's just a small result set (480
rows). I think he also mentioned retrying in Slack and that the downloads
timing out. That has been my experience too. The API doc does state that
this is expected behaviour and to use the API for this. If the API
continues to be down, that is the route I will be forced to take. It's a
manual laborious process and one I will do my best to avoid.

—
Reply to this email directly or view it on GitHub
#6 (comment).

Anusha Ranganathan · Answer 10 · Wed Oct 28 2015 20:21:56 GMT+0800 (China Standard Time)

I was just looking into the possibility of doing just that.

Rik · Answer 11 · Thu Oct 29 2015 05:00:05 GMT+0800 (China Standard Time)

The site is pretty slow, and I eventually gave up trying to get the full data download to work. Scraping might be the easiest way - or perhaps just emailing them about the issue?

Anusha Ranganathan · Answer 12 · Thu Oct 29 2015 19:51:41 GMT+0800 (China Standard Time)

The api site is still down. I have written to them about this. In the meantime I finally managed to download all of the data (8 searches by category) and have saved the csv files. A manual process, but for next time (when we need to update the data) it shouldn't be too time consuming. Scraping their site for information doesn't look easy, given their UI, or to replicate human actions for the searches. I have for now saved these files in the CottageLabs/ContentMine Google drive.

Anusha Ranganathan · Answer 13 · Fri Oct 30 2015 18:46:59 GMT+0800 (China Standard Time)

Code to index IUCN redlists data - https://github.com/anusharanganathan/redlist-indexer.

Will index the data in the file in Google drive - CottageLabs/ContentMine/IUCN-Redlist-Data/all.csv

Anusha Ranganathan · Answer 14 · Sat Oct 31 2015 00:56:01 GMT+0800 (China Standard Time)

Okay, heard back from the IUCN people. The link to the API I had found was to a beta version. The URL I should be using is http://apiv3.iucnredlist.org/api/v3/docs

Calls to make

Get number of species

http://apiv3.iucnredlist.org/api/v3/speciescount?token=YOUR_TOKEN_ID
Get list of species by page
_Need the species count to calculate number of pages. The response for page 1 has number of rows_

http://apiv3.iucnredlist.org/api/v3/species/page/1?token=YOUR_TOKEN_ID
http://apiv3.iucnredlist.org/api/v3/species/page/2?token=YOUR_TOKEN_ID
This will return the fields - species id (taxonid), scientific_name, infra_rank, infra_name, population and category
Get information for each of the species by ID

http://apiv3.iucnredlist.org/api/v3/species/id/species_id?token=YOUR_TOKEN_ID
This will give us additional information regarding the species.
The fields returned are - taxonid, scientific_name, kingdom, phylum, class, order, family, genus, main_common_name, authority, published_year, category, criteria, marine_system, freshwater_system, terrestrial_system, assessor, reviewer

NOTE from the API page :
The species ID might change and should not be used as persistent identifier. To find the species ID, use the weblink api call with the species name

http://apiv3.iucnredlist.org/api/v3/weblink/loxodonta%20africana
The only issue I find with this is that, looking at the data, the species name does not seem to be unique. The species concolor is listed 5 times in the csv data with different species ID.
Oh understood, the combination of "genus species_name" is what needs to be used to find the current id. Maybe, this combination can be used as a unique key in elastic search.

petermr · Answer 15 · Sat Oct 31 2015 01:40:32 GMT+0800 (China Standard Time)

Thanks,

When you say the "species" is not unique, are you referring to the binomial
Latin name ( https://en.wikipedia.org/wiki/Binomial_nomenclature ) of two
words or just to the second word? (It's quite common for the second word,
identifying the species within the genus to be found in many names. e.g.
"Passer domesticus " is the House Sparrow and Acheta domesticus is the
House Cricket_)_ Ross will probably give more examples. However there are
cases where binomial names are not unique , where one is a plant and
another an animal for example.

In either case I wouldn't worry - we can sort it out.

On Fri, Oct 30, 2015 at 4:56 PM, Anusha Ranganathan <
notifications@github.com> wrote:

Okay, heard back from the IUCN people. The link to the API I had found was
to a beta version. The URL I should be using is
http://apiv3.iucnredlist.org/api/v3/docs

Calls to make

Get number of species

http://apiv3.iucnredlist.org/api/v3/speciescount?token=YOUR_TOKEN_ID

Get list of species by page

Need the species count to calculate number of pages. The response for
page 1 has number of rows

http://apiv3.iucnredlist.org/api/v3/species/page/1?token=YOUR_TOKEN_ID
http://apiv3.iucnredlist.org/api/v3/species/page/2?token=YOUR_TOKEN_ID
This will return the fields - species id (taxonid), scientific_name,
infra_rank, infra_name, population and category

Get information for each of the species by ID

http://apiv3.iucnredlist.org/api/v3/species/id/species_id?token=YOUR_TOKEN_ID
This will give us additional information regarding the species.
The fields returned are - taxonid, scientific_name, kingdom, phylum,
class, order, family, genus, main_common_name, authority, published_year,
category, criteria, marine_system, freshwater_system, terrestrial_system,
assessor, reviewer

NOTE from the API page :
The species ID might change and should not be used as persistent
identifier. To find the species ID, use the species name

http://apiv3.iucnredlist.org/api/v3/weblink/loxodonta%20africana
The only issue I find with this is that, looking at the data, the species
name does not seem to be unique. The species concolor is listed 5 times in
the csv data with different species ID

—
Reply to this email directly or view it on GitHub
#6 (comment).

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

markmacgillivray · Answer 16 · Thu Nov 12 2015 21:40:14 GMT+0800 (China Standard Time)

I used the csv file that Anusha got but in the end just created a simple python script to upload it rather than the repo she put together, so I could create a dynamic mapping on the index easily. Species IDs do appear to be unique, as do species by binomial name. Website demo is now also up and running.

Josh Gage · Answer 17 · Fri Sep 28 2018 03:28:30 GMT+0800 (China Standard Time)

Code to index IUCN redlists data - https://github.com/anusharanganathan/redlist-indexer.

Will index the data in the file in Google drive - CottageLabs/ContentMine/IUCN-Redlist-Data/all.csv

Hi! After being stumped on the IUCN website, I stumbled upon this thread in my search for a complete export of IUCN species. I am using the google doc @anusharanganathan posted. Do you all have a more recent, full export from the IUCN database?