futres / rfutres

R package for accessing PPO data store

Home Page:https://docs.ropensci.org/rppo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rfutres

The Functional Trait Resource for Environmental Studies project, or FuTRES, aggregates vertebrate traits from Vertnet and other research datasets. The FuTRES project utilizes the FuTRES Ontology for Vertebrate Traits (FOVT) to align trait terms from the various databases. The rfutres R package enables programmatic access to all data contained in the FuTRES data portal incuding selected classes contained in the fovt ontology.

For information on how data is assembled for the FuTRES data portal, visit the fovt-data-pipeline git repository.

Installation

The production version of rfutres will soon be accessible on CRAN. Until then, you can install the development version of rfutres from github with:

#if having issues, try setwd(".")
library("devtools")
# if you don't have devtools, you can install with install.packages("devtools") first
devtools::install_github("futres/rfutres")
library(rfutres)

Examples

Following are a couple of brief examples to illustrate how to get started with rfutres. We recommend visiting the futres man pages in the R environment, using ?futres_data and ?futres_traits.

Find more examples for each function under vignettes.

Downloading data

Downloading data using parameters. Here we want to query all results based on a year range and limit to 2 records

r <- futres_data(fromYear = 2000, toYear = 2010, limit=2)
sending request for data ...
https://biscicol.org/futresapi/v3/download/?q=%2ByearCollected:>=2000+AND+%2ByearCollected:<=2010&source=latitude,longitude,yearCollected,termID&limit=2

Query based on bounding box using lat, lng, lat, lng (any two corners of box)

r <- futres_data(bbox="37,-120,38,-119", limit=2)
sending request for data ...
https://biscicol.org/futresapi/v3/download/?q=%2BdecimalLatitude:>=37+AND+%2BdecimalLatitude:<=38+AND+%2BdecimalLongitude:>=-120+AND+%2BdecimalLongitude:<=-119&source=latitude,longitude,yearCollected,termID&limit=2

View the data returned, the number possible results (without limits), the citation, and the readme

print(r$data)
print(r$number_possible)
print(r$citation)
print(r$readme)

return a data frame of traits

traits <- futres_traits()
sending request for terms ...
No encoding supplied: defaulting to UTF-8.

print the 2nd present term returned

print(traits[2,])
             termID       label                              definition
2 obo:OBA_VT0001253 body height The height of a multicellular organism.
                                           uri
2 http://purl.obolibrary.org/obo/OBA_VT0001253

You can download all the data publically available in the FuTRES datastore.

r_all <- futres_data()
this may take awhile... time for some coffee?
sending request for data ...
https://biscicol.org/futresapi/downloadable/futres.zip

Checking FuTRES data

We also provide functions for error checking and labeling records with unknown lifestages.

See outlier.md in the vignettes folder for more examples.

Below are examples using:

First, let's download a subset of data

wildcat.store <- futres_data(scientificName = "Puma concolor")
wildcat  <- wildcat.store$data

sample_flag returns a count for the number of records of a trait for a species. It creates a "measurementStatus" column (if it does not already exist) and labels recrods as "too few records" if there are not enough records of a trait for a species.

The user must input which trait, life stage, and the minimum sample size per species to consider. The default is for a minimum of 3 records.

wildcat.samp <- sample_flag(data = wildcat,
                            trait = "body mass")
unique(wildcat.samp$measurementStatus)
[1] ""  "too few records"

outlier_flag returns if a record of a trait for a species is an outlier using Mahalanobis distance from the R package OutlierDetection or if there are too few records to determine outliers. It creates a "measurementStatus" column (if it does not already exist) and labels recrods as "outlier" or "too few records".

The user must input which trait, life stage, and the minimum sample size per species to consider. The default is for a minimum of 3 records.

wildcat.flag <- outlier_flag(data = wildcat,
                             trait = "body mass")
unique(wildcat.flag$measurementStatus)
[1] ""  "too few records" "outlier"

quantile_flag returns if a record of a trait for a species is an outlier using quantiles or if there are too few records to determine outliers. It creates a "measurementStatus" column (if it does not already exist) and labels recrods as "outlier", "possible adult, possibly good", "possible juvenile", or "too few records".

The user must input which trait, life stage, and the minimum sample size per species to consider. The default is for a minimum of 3 records and for the quantile to be at +/- 0.05 (5%).

wildcat.quant <- quantile_flag(data = wildcat, 
                            trait = "body mass")
unique(wildcat.quant$measurementStatus)
[1] ""  "too few records" "outlier" "possible adult, possibly good" "possible juvenile"
unique(wildcat.quant$limitMethod)
[1] "quantile"

normal_flag first tests for normality, then, for normally-distributed trait values for a species, returns if a record is an outlier using upper and lower limits or if there are too few records to determine outliers. It creates a "measurementStatus" column (if it does not already exist) and labels recrods as "outlier", "possible adult, possibly good", "possible juvenile", or "too few records".

The user must input which trait, life stage, and the minimum sample size per species to consider. The default is for a minimum of 3 records and for the upper and lower limits to be +/- 3 standard deviations from the mean.

First, let's trim the dataset:

wildcat.trim <- wildcat[wildcat$measurementValue > 4000 &
                        wildcat$measurementValue < 10000,]
wildcat.norm <- normal_flag(data = wildcat.trim,
                            trait = "body mass",
                            sigma.steps = 3)
unique(wildcat.norm$normality) 
[1] "normal"  "non-normal"
unique(wildcat.norm$limitMethod)
[1] "sd"

logNormal_flag first tests for normality of log-transformed values, then, for normally-distributed trait values for a species, returns if a record is an outlier using upper and lower limits or if there are too few records to determine outliers. It creates a "measurementStatus" column (if it does not already exist) and labels recrods as "outlier", "possible adult, possibly good", "possible juvenile", or "too few records".

The user must input which trait, life stage, and the minimum sample size per species to consider. The default is for a minimum of 3 records and for the upper and lower limits to be +/- 3 standard deviations from the mean of log-transformed values.

wildcat.logNorm <- logNormal_flag(data = wildcat.trim,
                                  trait = "body mass",
                                  sigma.steps = 3)
unique(wildcat.logNorm$normality)
[1]  "log-normal" "non-log normal"
unique(wildcat.logNorm$limitMethod)
[1] "log sd"

Citation

To cite the ‘rfutres’ R package in publications use:

   'Meghan Balk, John Deck, Neeka Sewnath, Robert Guralnick' (2021). rfutres: An interface to the Functional Trait Resource for Environmental Studies and associated data store.  R package version 1.0
   https://github.com/futres/rfutres

Code of Conduct

View our code of conduct

About

R package for accessing PPO data store

https://docs.ropensci.org/rppo

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:R 100.0%