B-Rich / OnlineAdapterDatabase

Linking publicly deposited data to sequencing adapters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOI

AdapterBase

Background

Adapters are short sequences that are attached to cDNA templates during preparation of next generation sequencing (NGS) libraries. Depending on the preparation of the NGS library and how it is sequenced, the raw NGS data may be contaminated with the adapter sequences. See Didion et al. 2017 for more details.

Adapter trimming is a critical component of NGS data preprocessing. To trim adapters appropriately, it is necessary to know the sequences of the adapters that were used. However, adapter sequences are poorly documented and often are not included in the metadata of public database submissions (SRA, ENA, and DDBJ).

Target Users

AdapterBase is designed to make life easier for scientists who want to re-analyze data from the SRA. The goal is to be able to enter a run accession number (eg. SRR123456) into either the web interface or the command-line API and get out the sequences of the adapters that were used to create the library. This information will also eventually be exposed via Python bindings, so that adapter trimming programs like Atropos can access them directly.

Because a database is only as useful as the quality of the data in it, we also provide the ability for the groups doing the sequencing to create entries for their data in AdapterBase at the same time as depositing it in the SRA/ENA/DDBJ. We have begun to prepopulate the database with annotations of existing data done by automatic detection of adapters using Atropos. Similarly, we have extracted lists of kits and adapter sequences have been extracted from Illumina's documentation, and users can add data for other kits as available.

System Design

AdapterBase is implemented in SQLite3 and Django with the primary API implemented in REST. Access to the database is via web (URL TBD), command line, and/or Python bindings.

AdapterBase schema

Usage

Currently, AdapterBase can be accessed from the Hackathon AWS instance by mapping port 80 back to the local host. A permanent, publically facing home will be determined later. To use the AWS instance, please see these instructions.

Local installation

If you want to spin up a local copy of AdapterBase:

  1. Make sure python3 is installed
  2. Clone or download this git repository
  3. From the oadb directory, enter ./buildoadb.sh -v venv
  4. From the oadb directory, enter ./runoadb.sh -v venv
  5. Open a new broswer tab and navigate to http://localhost:8000

These scripts are for simplicity, you can examine them to see what they do.

Docker installation

You can build a docker image using the following incantations:

sudo docker pull ubuntu:16.04
sudo docker build -t oadb:latest .

You can start the application in the background using the following incantation:

sudo docker run -d -p 8000:8000 --name oadb oadb:latest

As before, open a new browser and navigate to http://localhost:8000

sudo docker run -d -p 8000:8000 --name oadb oadb:latest

Web interface vignettes

Get adapter sequences used by a run from the accession number

Deposit adapter information for a run

Adding new kits and/or adapter sequences

Using the Commandline API

Using Python bindings

Remaining Goals

  1. Complete implementation of website/API
  2. Pre-populate Run database from SRA using Atropos
  3. Find a home for web implementation and build Docker image
  4. User group implementation and security features

Stretch goals/post-hackathon

  1. Continue building out run database with manual curation of SRA datasets
  2. Integrate the AdpaterBase API into Atropos
  3. Develop a script to scan a set of SRA accessions for adapters and match identified adapter names against the LIBRARY_CONSTRUCTION_PROTOCOL block in the SRA metadata.

Manuscript

A draft manuscript describing AdapterBase may be found here.

Project Team

AdapterBase was intitially developed as part of an NCBI-sponsored hackathon at the National Library of Medicine, August 14-16th, 2017.

About

Linking publicly deposited data to sequencing adapters.

License:MIT License


Languages

Language:Python 85.7%Language:HTML 6.4%Language:Shell 4.7%Language:Perl 1.6%Language:CSS 1.1%Language:Ruby 0.4%