SRI-pipeline

Introduction

S(ynteny) R(elationship) I(ndex) is a score that quantifies synteny between two genomes. SRI calculating is based on the result of JCVI links. This pipeline contains few steps:

Syntenty inference through JCVI

get longest uniq transcripts


python -m jcvi.formats.gff bed --type=mRNA --key=ID ${gff3} > ${gff3/gff3/bed}
python -m jcvi.formats.bed uniq ${gff3/gff3/bed}
mv uniq.bed ${sample}.bed
cut -f 4 ${sample}.bed |seqkit grep -f - ${gff3/gff3/cds} > ${sample}.cds

Syntenty inference

# sample1 & sample2 is a pair of genomes who needs to compare.
python -m jcvi.compara.catalog ortholog --no_strip_names ${sample1} ${sample2}

SRI calculating

python syndex.py <syn.config> <output>

the format of syn.config file should be:

## Anchore file from jcvi A.bed B.bed
A6-26.E4-63.lifted.anchors      A6-26.bed       E4-63.bed
A6-26.PG0009.lifted.anchors     A6-26.bed       PG0009.bed
A6-26.PG1008.lifted.anchors     A6-26.bed       PG1008.bed

Output

The ouput file looks like:

#sample_pair    SRI
An-1,C24        0.9769
An-1,Cvi        0.9757
An-1,Eri        0.9773
Kyo,An-1        0.9732
An-1,Ler        0.9754
Sha,An-1        0.9719

Yujiaxin419 / SRI-Pipeline

SRI-pipeline

Introduction

Syntenty inference through JCVI

SRI calculating

Output

About

Languages