barslmn / variantversioncontrol

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Variant Version Control

Variant version control (VVC) is a tool for tracking changes over time in variant annotation from source like Ensembl, NCBI, Clingen via their APIs’. VVC helps you to store the variant that you are interested in a git repository. Makes it possible to track changes in variant annotation with git commits. Informs you of the annotation changes by mail or notification.

TODO: Singularity image TODO: Email setup

Installation

git clone

Set up a cron job cron is a tool for periodic job execution. To update weekly run the following command.

https://crontab.guru/#0_9_*_*_3

Usage

API Table

Germline

pathurlhandlerlimit
ENSEMBL_RECODERhttps://rest.ensembl.org/variant_recoder/human/$spdirecoder_handler
ENSENBL_VEPhttps://rest.ensembl.org/vep/human/hgvs/$hgvsg
NCBI_RSIDhttps://api.ncbi.nlm.nih.gov/variation/v0/spdi/$spdi/rsidsrsid_handler
NCBI_ALFAhttps://api.ncbi.nlm.nih.gov/variation/v0/refsnp/$rsid/frequency
NCBI_LITVARhttps://ncbi.nlm.nih.gov/research/bionlp/litvar/api/v1/entity/litvar/rs$rsid%23%23
CLINGEN_ACMGhttps://erepo.genome.network/evrepo/api/interpretations?hgvs=$hgvsg
BROAD_SPLICEAIhttps://spliceailookup-api.broadinstitute.org/spliceai/\?hg\=38\&variant\=$broad
INTERVAR_ACMGhttp://wintervar.wglab.org/api_new.php?queryType=position&chr=$chrom&pos=$pos&ref=$ref&alt=$alt&build=hg3816
MYVARIANTINFOhttps://myvariant.info/v1/variant/$hgvsg

Somatic

  • CIVIC
  • Onkokb
  • CancerVar

Source

Logger script

This function is copied from the dep-get project. The function is used to print messages to the terminal. The function takes two arguments. The first argument is the type of message and the second argument is the message itself.

fancy_message() (
    if [ -z "${1}" ] || [ -z "${2}" ]; then
        return
    fi

    RED="\e[31m"
    GREEN="\e[32m"
    YELLOW="\e[33m"
    MAGENTA="\e[35m"
    RESET="\e[0m"
    MESSAGE_TYPE=""
    MESSAGE=""
    MESSAGE_TYPE="${1}"
    MESSAGE="${2}"

    case ${MESSAGE_TYPE} in
        info) printf "  [${GREEN}+${RESET}] %s\n" "${MESSAGE}" ;;
        progress) printf "  [${GREEN}+${RESET}] %s" "${MESSAGE}" ;;
        recommend) printf "  [${MAGENTA}!${RESET}] %s\n" "${MESSAGE}" ;;
        warn) printf "  [${YELLOW}*${RESET}] WARNING! %s\n" "${MESSAGE}" ;;
        error) printf "  [${RED}!${RESET}] ERROR! %s\n" "${MESSAGE}" ;;
        fatal)
            printf "  [${RED}!${RESET}] ERROR! %s\n" "${MESSAGE}"
            exit 1
            ;;
        *) printf "  [?] UNKNOWN: %s\n" "${MESSAGE}" ;;
    esac
)

Log function and log level

This function is used to print messages to the terminal. The function takes two arguments. The first argument is the type of message and the second argument is the message itself. The log level is set by the environment variable VVC_LOGLVL. The default log level is INFO.

get_log_level() {
    case "$1" in
        debug | DEBUG | d | D)
            lvl=0
            ;;
        info | INFO | I | i)
            lvl=1
            ;;
        warning | warn | WARNING | WARN | W | w)
            lvl=2
            ;;
        error | err | ERROR | ERR | E | e)
            lvl=3
            ;;
    esac
    echo "$lvl"
}

LOGLVL=$(get_log_level "$VVC_LOGLVL")
# if [ "$LOGLVL" = 0 ]; then set -xv; fi

log() {
    level=$1
    message=$2
    loglvl=$(get_log_level "$level")
    if [ "$loglvl" -ge "$LOGLVL" ]; then
        case $loglvl in
            0 | debug)
                fancy_message "info" "$level $message"
                ;;
            1 | info)
                fancy_message "info" "$level $message"
                ;;
            2 | warn)
                fancy_message "warning" "$level $message"
                ;;
            3 | err)
                fancy_message "error" "$level $message"
                ;;
        esac
    fi
}

VVC main script

This section sets up the environment variables. These variables are:

  • VVC_DIR: directory where the variant repository is stored. Example: $HOME/.local/share
  • VVC_REPOSITORY: name of the variant repository. Example: my_variants
  • VVC_LOGLVL: log level of the script options are: DEBUG, INFO, WARNING, ERROR
set -e
set -u
BASEDIR=$(dirname "$0")
cd "$BASEDIR" || exit

if [ -z "${VVC_DIR-}" ]; then
    VVC_DIR="$HOME/.local/share"
fi

if [ -z "${VVC_LOGLVL-}" ]; then
    VVC_LOGLVL="INFO"
fi
if [ -z "${VVC_REPOSITORY-}" ]; then
    VVC_REPOSITORY="my_variants"
fi

. ./logger.sh
TMPDIR=$(mktemp -d VVCp$$.XXXXXX -p /tmp)
log "debug" "TEMP DIRECTORY set to: $TMPDIR"
LC_ALL=C

VARIANT_REPOSITORY="$VVC_DIR/$VVC_REPOSITORY"
CURRENT_DIR="$PWD"
readonly VERSION="0.0.1"

Help function

Avaliable commands are:

  • update: update the variant repository.
    • example usage: vvc update
  • add: add a variant to the repository
    • example usage: vvc add rs1234
  • remove: remove a variant from the repository
    • example usage: vvc remove rs1234
  • search: search for a variant in the repository
    • example usage: vvc search rs1234
  • show: show information about a variant
    • example usage: vvc show rs1234
  • list: list all variants in the repository
    • example usage: vvc list
  • tsvlist: list all variants in the repository in tsv format
    • example usage: vvc tsvlist
  • help: show this help
  • version: show the version of vvc
usage() {
    cat <<HELP
Usage
vvc { update | add | remove | show | list | tsvlist | version }
vvc provides a high-level commandline interface for the keeping track of
variant annotation changes from sources like Ensembl, NCBI via their API's.

update
    update is used to resynchronize the variant annotations from their sources.
add
    add is followed by one variant identifier (rs number, SPDI, hgvs) desired to be annotated and keep track of
remove
    remove is identical to add except that variants are removed and no longer kept track of.
search
    search for the given regex(7) term(s) from the list of track variants and display matches.
show
    show information about the given variant including its install source and update mechanism.
history
    show change history of the given variant.
list
    list the variants.
tsvlist
    tsv formatted list the variants
help
    show this help
version
    show vvc version
HELP
}

Check variant repository

This is the first thing that is done when the script is run. The function checks if the variant repository exists. If it does not exist, it creates it. If it does exist, it changes the current directory to the variant repository.

check_variant_repository() {
    if [ -d "$VARIANT_REPOSITORY" ]; then
        log "debug" "Directory $VARIANT_REPOSITORY exists. Changing directory."
        cd "$VARIANT_REPOSITORY" || exit
        if [ "$(git rev-parse --is-inside-work-tree 2>/dev/null)" ]; then
            log "debug" "Variant repository at $VARIANT_REPOSITORY exists."
            return 0
        else
            log "info" "Variant repository at $VARIANT_REPOSITORY does not exist. Creating it for you."
            git init
            touch "$VARIANT_REPOSITORY/variants"
            mkdir -p "$VARIANT_REPOSITORY/annotations"
            git add variants annotations/
            git config --add --local user.name variantversioncontrol
            git config --add --local user.email variant@version.control
            git commit -m "Initial commit"
        fi
    else
        log "info" "Directory $VARIANT_REPOSITORY does not exist. Creating it for you."
        mkdir -p "$VARIANT_REPOSITORY/annotations"
        check_variant_repository
    fi
}

Validate variant

Currently only SPDI format is supported. TODO: add support for rs numbers and hgvs

validate_variant() {
    VARIANT="$1"
    if echo "$VARIANT" | grep -P '(chr|)([1-9]|1[1-9]|2[0-2]|X|Y):(\d+):(A|T|C|G)+:(A|T|C|G)+' >/dev/null; then
        log "debug" "$VARIANT variant passed the regex validation."
    else
        log "error" "$VARIANT variant needs to be in SPDI format."
    fi
}
curl -s 'https://rest.ensembl.org/map/human/GRCh38/18:36156575..36156575:1/GRCh37?' -H 'Content-type:application/json' | jq ".mappings[].mapped"
  • update variant
stepusingdescription
1ENSEMBLFirst run ensembl recoder.
2ENSEMBLGet VEP annotation
3JQGet HGVSg and refseq SPDI from ensembl/recoder.
4NCBIGet rsid from ncbi/variation using that HGVSg
5NCBIGet frequencies from alfa usind rsid.
6NCBIGet pmids from litvar using rsid.
# $1 path
# $2 data
write_data() {
    mkdir -p "annotations/$1"
    echo "$2" | jq -S '.' >"annotations/$1/data"
}

api_call() {
    url="$1"
    response_content="$2"
    method="GET"
    header="accept: application/json"
    log "debug" "function: api_call0 starting with url: $url and response_content: $response_content"
    response_code=$(curl --fail --silent --write-out "%{http_code}\n" --output "$response_content" --request "$method" --header "$header" "$url")
    log "debug" "function: api_call1 url: $url response_code: ${response_code-}"
    echo "$response_code"
}

update_variant() {
    identifiers="spdi=$1"
    sed '/^$/d;/^#/d;' "$CURRENT_DIR"/apitable.tsv | while IFS= read -r line; do
        url_tmp=$(mktemp -p "$TMPDIR")
        unset api
        unset url
        unset handler
        unset rate_limit
        log "debug" "Identifiers are: $(echo "$identifiers" | tr '\n' ' ')"
        log "debug" "url_tmp is: $url_tmp"
        api=$(echo "$line" | awk -F"\t" '{print $1}')
        url=$(echo "$line" | awk -F"\t" '{print $2}')
        handler=$(echo "$line" | awk -F"\t" '{print $3}')
        rate_limit=$(echo "$line" | awk -F"\t" '{print $4}')
        response_content=$(mktemp -p "$TMPDIR")
        printf '%s\nurl="%s"' "$identifiers" "$url" >"$url_tmp"

        # first check if the required identifiers are set
        if (. "$url_tmp" 2>/dev/null); then
            . "$url_tmp";
        else
            log "info" "Required identifiers were not set. Skipping $api.";
            continue;
        fi

        log "debug" "Calling $url"
        if [ -n "$rate_limit" ]; then
            log "debug" "Waiting for $rate_limit seconds before calling $url"
            sleep "$rate_limit"
        fi
        # log "debug" "function: update_variant0 api: $api url: $url handler: $handler rate_limit: $rate_limit response_content: $response_content"
        # response_code=$(api_call "$url" "$response_content")
        method="GET"
        header="accept: application/json"
        log "debug" "api_call starting with url: $url and response_content: $response_content"
        # TODO: insecure is required for spliceai.
        response_code=$(curl --insecure --silent --write-out "%{http_code}\n" --output "$response_content" --request "$method" --header "$header" "$url")
        log "debug" "api_call ended for $api response_code: ${response_code-}"

        log "debug" "function: update_variant1 response_code: $response_code"

        if [ "${response_code-}" -eq 200 ]; then
            response_content=$(cat "$response_content")
            if [ -n "$handler" ]; then
                log "debug" "running handler: $handler"
                identifiers=$(sh "$CURRENT_DIR/$handler" "$response_content" | sed -e 's/=\([^"][^ ]*\)/="\1"/g')
                export identifiers
            fi
            write_data "$1/$api" "$response_content"
        else
            log "info" "Call to $api returned $response_code not updating this file."
        fi
    done
    log "debug" "committing changes"
    git add .
    # pipe to true to avoid failing if there are no changes.
    git commit -m "updated $variant" || true
}
update_annotations() {
    log "info" "Updating all variants."
    sed '/^$/d;/^#/d;' variants | while IFS= read -r variant; do
        log "info" "Updating variant $variant"
        update_variant "$variant"
        log "info" "Finished updating variant $variant"
    done
}

add variant

add_variant() {
    variant="$1"
    if grep "$variant" variants >/dev/null; then
        log "info" "variant $variant already added! Skipping."
        return 0
    fi
    validate_variant "$variant"

    log "info" "Adding variant $variant"
    echo "$variant" >>variants
    mkdir -p "annotations/$variant/"
    git add variants "annotations/$variant/"
    git commit -m "added variant $variant"
    update_variant "$variant"
}

Main section

main() {
    check_variant_repository
    before_hash=$(git rev-parse --short HEAD)

    if [ -n "${1}" ]; then
        ACTION="$1"
        shift
    else
        log "error" "You must specify an action."
        # usage
        exit 1
    fi

    case ${ACTION} in
        add | remove | show)
            if [ -z "${1}" ]; then
                log "error" "You must specify a variant:\n"
                list_variants
                exit 1
            fi
            ;;
    esac

    case "${ACTION}" in
        show) ;;
        add)
            for variant in "$@"; do
                add_variant "$variant"
            done
            ;;
        list)
            list_variants
            ;;
        tsv_list | tsvlist | tsv)
            tsvlist_variants
            ;;
        remove) ;;

        search)
            list_variants | grep "${1}"
            ;;
        update)
            update_annotations
            ;;
        version) echo "${VERSION}" ;;
        help) usage ;;
        *) log "fatal" "Unknown action supplied: ${ACTION}" ;;
    esac

    # TODO: maybe do the updates async?
    # while ps $! >/dev/null; do
    #     sleep 5
    #     log "info" "Processing..."
    # done

    after_hash=$(git rev-parse --short HEAD)

    if [ "$before_hash" != "$after_hash" ]; then
        log "info" "Changes were made."
        git diff --stat "$before_hash" "$after_hash"
    fi

    cd "$CURRENT_DIR" || exit
    # if [ "$LOGLVL" = 0 ]; then set +xv; fi
    if [ "$LOGLVL" -ge 1 ]; then rm -rf "$TMPDIR"; fi
}

main "$@"
#!/usr/bin/env sh
set -e
set -u
echo "info" "test.sh started"
export VVC_REPOSITORY="test-repository"

BASEDIR=$(dirname "$0")
cd "$BASEDIR" || exit

# test adding variants
TEST_VARIANTS="18:36156575:G:A 18:36156575:G:T 18:36156575:G:C 18:36156575:G:AT 18:36156575:G:AC chr12:48968150:T:C chr12:48968150:T:G chr16:3254555:C:T chr16:3254555:C:G"
echo "info" "test.sh adding variants"
./vvc.sh add $TEST_VARIANTS
# for variant in $TEST_VARIANTS; do
#     ./vvc.sh add "$variant"
# done

# test updating variants
echo "info" "test.sh updating variants"
./vvc.sh update

# test listing variants
echo "info" "test.sh listing variants"
./vvc.sh list

Singularity Recipe

Bootstrap: docker
From: ubuntu:rolling

%post
  apt-get -y update
  apt-get -y install git jq curl

%files
  hello.py /

%runscript
  python /hello.py

About