breeDBase loading scripts

This repository contains general notes and scripts for loading data into T3 breeDBase instances.

Instance Setup

Docker Setup

Clone SGN and instance repositories for new instance

cd /opt/breedbase/repos

# Clone the repos
git clone git@github.com:TriticeaeToolbox/sgn ./sgn_avena
git clone git@github.com:TriticeaeToolbox/avena ./avena

# Switch to the SGN t3/master branch
cd sgn_avena
git checkout t3/master

Create instance config file and modify contents

cp /opt/breedbase/config/triticum.conf /opt/breedbase/config/avena.conf

Set dbname = cxgn_avena
Update path names s/triticum/avena/g
Set main_production_site_url = https://oat.triticeaetoolbox.org
Update ontology variables onto_root_namespaces, trait_cv_name, trait_variable_onto_root_namespace

Create docker mount points

cd /opt/breedbase/mnt
mkdir avena
mkdir avena/archive avena/prod avena/public

Add instance to /opt/breedbase/bin/breedbase script

Set instance-specific paths and variables

Setup NGINX configuration

cp /etc/nginx/sites-available/wheat /etc/nginx/sites-available/avena
ln -s /etc/nginx/sites-available/avena /etc/nginx/sites-enabled/avena

Set server_name = https://oat.triticeaetoolbox.org
Modify http redirect return URL
Modify port in proxy_pass
Remove ssl_certificate and ssl_certificate_key directives (added by certbot)

# Get SSL certificates via certbot and Let's Encrypt
certbot --nginx

Database Setup

Create database based off of the breedbase template

CREATE DATABASE cxgn_avena WITH TEMPLATE breedbase;

Set permissions for web_usr

See /sql/web_usr_grants.sql for permissions

Run Database Patches

# From avena web docker container
cd /home/production/cxgn/sgn/db
export PERL_USE_UNSAFE_INC=1 # Allow the current directory in INC
./run_all_patches.pl -u postgres -p "postgres_pass" -e admin -h breedbase_db -d cxgn_avena

Web Instance Setup

If the node JS modules don't install properly:

Example: if there is a permission error during the build process when performing a git clone

Install NodeJS on the docker host machine, if not already The web docker container currently has node version 10 installed

# From the docker host
curl -sL https://deb.nodesource.com/setup_10.x | bash -
apt-get install nodejs

Install the node packages from the docker host machine These will get mounted to the SGN repo via Docker

# From the docker host, DO NOT RUN AS ROOT
cd {BB_HOME}/repos/{SGN_REPO}/js
rm -rf node_modules
npm cache clean -f
npm install

Restart the web container

Load Trait Ontology

Follow the workflow described in ontology-scripts/WORKFLOW.md

If the CV composable_cvtypes is missing, make sure the DB patch 00073 has been run.

Load Breeding Programs

There are two perl scripts to aid the bulk loading of breeding programs.

The first ./bin/breeding_programs/add_breeding_program.pl script can be used to add a single Breeding Program to the specified database with a specified name and description.

# Add a single breeding program to the database
./bin/breeding_programs/add_breeding_program.pl -H localhost -D cxgn_avena -U postgres -P postgrespass -n "Test Breeding Program" -d "This is a Breeding Program used for testing"

The second ./bin/breeding_programs/add_breeding_programs_bulk.pl script can be used to parse a CSV file containing Breeding Program information (designed to use the T3 contributing data programs table). Each row of a breeding program's information will added to the database if the name does not yet exist.

Example CSV file:

# From ./data/breeding_programs/oat_breeding_programs.csv
Breeding Program,Code,Collaborator,Description,Institution
"AAES, Auburn University",AUB,Kathryn Glass,"Alabama Agricultural Experiment Station (AAES), Auburn University, AL-USA.",Auburn University
AAFC Agassiz,ABC,,"Agriculture and Agri-Food Canada (AAFC) Pacific Agri-Food Research Centre (PARC) in Agassiz, BC-CAN.",Agriculture and Agri-Food Canada
AAFC Brandon,MTB,Jennifer W. Mitchell-Fetch,"Agriculture and Agri-Food Canada (AAFC) Brandon Research Centre, MB-CAN.",Agriculture and Agri-Food Canada

# Parse a CSV file of breeding program information
./bin/breeding_programs/add_breeding_programs_bulk.pl -H localhost -D cxgn_avena -U postgres -P postgrespass -d ./data/breeding_programs/oat_breeding_programs.csv

Load Accessions

Accessions can be bulk loaded using an Excel template through the web site (Manage > Accessions).

The breedbase R package has helper functions to help create a breedbase accession template. The R script ./bin/accessions/add_accessions.R contains the createAccessions() function which will read the T3 line information file and create a list of Accessions. These Accessions can be passed to the breedbase::writeAccessionTemplate() function to create the breedbase accession template file(s).

# Create Accessions from the T3 line information file
a <- createAccessions(lines="./data/accessions/oat_line_records.csv", programs="./data/breeding programs/oat_breeding_programs.csv", genus="Avena")
# Create a breedbase acccession template table
t <- buildAccessionTemplate(a)
# Write the breedbase accession template table to files
writeAccessionTemplate(t, output="./templates/accessions/oat_accessions.xls", chunk=6000)

Once created, the accession template file(s) can be uploaded to the breedbase website.

Pedigrees

Pedigrees can be stored in two different ways:

The official Breedbase way is to assign two parents to each Accession entry. Each of the parents, in turn, need to be Accession entries themselves. This method allows the interactive pedigree viewer to work.
The simplified T3 way is to add a Purdy pedigree string as the pedigree property of the Accession. The pedigree property is not parsed by the database and is displayed on the Stock detail page. This method allows a user to search for Accessions on the text of the pedigree string.

Add Pedigrees the Breedbase way:

A pedigree file (a spreadsheet containing a table of Accessions and their parent names) can be uploaded through the website from the Manage > Accessions page. The breedbase R package has helper functions to help create the spreadsheet template.

Add Pedigrees the T3 way:

Purdy Pedigree strings no longer have to be added to the database separately. The T3 breedbase instances support the Accession properties of purdy_pedigree and filial_generation and can be added as an Accession property directly from the Accession upload template. The breedbase R package has helper functions to help create the Accession upload template.

~~First, the pedigree stock property needs to be added to the database. The add_stockprop_term.pl perl script can be used:~~

perl ./bin/accessions/add_stockprop_term.pl -H localhost -D cxgn_avena -n pedigree -d "Purdy pedigree string of an Accession"

~~Then, a file (following the T3 bulk line information format):~~

# pedigrees.csv
   Name         Species GRIN  Synonym Breeding Program Parent1 Parent2 Pedigree                        Description
 1 02-18228                                                            Pio25R26/ 9634-24437//95-4162
 2 02-194638-1                                                         Patton / Cardinal // 96-2550

~~can be used to define the pedigrees for each of the Accessions. The add_stock_pedigrees.pl perl script can be used to add the pedigrees as stock properties of the Accessions:~~

perl ./bin/accessions/add_stock_pedigrees.pl -H localhost -D cxgn_avena -d pedigrees.csv

~~Finally, in order for the pedigree property to be displayed on the Stock detail page, the editable_stock_props configuration variable (sgn_local.conf) needs to have pedigree appended to it.~~

Generate `web_usr` grants

To generate the sql statements for all of the permissions for the web_usr:

pg_dump -h localhost -U postgres -d cxgn_triticum_test --schema-only | grep -e '^\(ALTER\|GRANT\) .* TO web_usr;$' > grants.sql

TriticeaeToolbox / loading-scripts

breeDBase loading scripts

Instance Setup

Docker Setup

Database Setup

Web Instance Setup

Load Trait Ontology

Load Breeding Programs

Load Accessions

Pedigrees

Generate `web_usr` grants

About

Languages

breeDBase loading scripts

Instance Setup

Docker Setup

Database Setup

Web Instance Setup

Load Trait Ontology

Load Breeding Programs

Load Accessions

Pedigrees

Generate web_usr grants

About

Languages

Generate `web_usr` grants