sebrauschert / OceanOmics-amplicon-paper-analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

F.A.I.R. and reproducible analysis for the paper: "Exploring the data that explores the oceans: working towards robust eDNA workflows for ocean wildlife monitoring (submitted)"

by Jessica R. Pearce 1, Philipp E. Bayer 1,2, Adam Bennett 1, Eric J. Raes 1,2, Marcelle E. Ayad 1, Shannon Corrigan 1,2, Matthew W. Fraser 1,2, Denise Anderson 3, Priscila Goncalves 1,2, Benjamin Callahan 4, Michael Bunce 5, Stephen Burnell 1,2, Sebastian Rauschert 1,2,*

1 Minderoo Foundation, Perth 6000, WA
2 The UWA Oceans Institute, The University of Western Australia, Crawley 6009, WA
3 INSiGENe Pty Ltd.
4 North Carolina State University, Raleigh, 27606, USA
5 Department of Conservation, Wellington, New Zealand

*Corresponding author

Start here to immediately re-analyse the data
Launch analysis: Binder

run with docker run with singularity

Analysis

This repository contains all data and code to generate the figures and statistics in the paper. Simply click on the above binder button to launch a Rstudio session in the browser, with access to all code and data in this GitHub repository. There, the code can interactively be changed and different plots and statistics can be (re-)created.

What is binder?

For an overview of what binder is, please check out this link.

Where does the data in this repo come from?

This repository contains the phyloseq objects for all three data sets analysed in the paper. The objects were generated with Minderoo OceanOmics amplicon nextflow pipeline. Below is a detailed description, including code, to recreate the phyloseq objects. The three data sets can be found here:

Additionally, this repository includes a list of Australian marine fish species, named Aust_fish_species_list.csv. This was manually curated by domain experts, with data drawn from Atlas of Living Australia and the Global Biodiversity Information Facility.

The files in metadata were generated as part of the data collection and sequencing, and are downloaded is part of the downloading the data description.
Lastly, The read_qc folder contains QC output from the seqkit part of the nextflow pipeline and contains read QC statistics.

Documentation: Generating the phyloseq objects

Everything in this section is optional and not required for re-analyzing the results of the paper. It is documented for full transparency and reproducibility of all results, should anyone desire to want to do so. Information on setting up the compute environment, downloading the data and creating the phyloseq object can be found in the docs folder or via the clickable links.

About

License:MIT License


Languages

Language:R 99.6%Language:Dockerfile 0.4%