zavolanlab / PAQR_KAPAC

scripts, pipelines and documentation to run PAQR and KAPAC; KAPAC allows to infer regulatory sequence motifs implicated in 3’ end processing changes; PAQR enables the quantification of poly(A) site usage from standard RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Further question about how to properly set up the sample relationships

aleighbrown opened this issue · comments

A bit confused about the appropiate way to set up the samples in the config.yaml

Currently the config.yaml as provided when you download looks likes this:

#-------------------------------------------------------------------------------
# sample specific values:
# - name of samples per study
# - name of BAM file and condition per sample
#-------------------------------------------------------------------------------

HNRNPC_KD:
  samples: [ctl_rep1, ctl_rep2, HNRNPC_rep1, HNRNPC_rep2]

ctl_rep1: {bam: CTL_rep1, type: CNTRL}
ctl_rep2: {bam: CTL_rep2, type: CNTRL}
HNRNPC_rep1: {bam: KD_rep1, type: KD, control: ctl_rep1}
HNRNPC_rep2: {bam: KD_rep2, type: KD, control: ctl_rep2}

Are the HNRNPC_rep1 being directly compared to ctl_rep1?
What if my samples don't have such a clear cut this control should be compared to this case relationship, eg, I've done 3 biological replicates in each condition but they're not what I would call directly matched.

If my samples are MUT1,MUT2,MUT3, WT1,WT2,WT3 how would it make a difference in the final analysis if matter if I did set up the relationship as

MUT1: {bam: MUT1, type: MUT, control: WT1}
MUT2: {bam: MUT2, type: MUT, control: WT2}

vs

MUT1: {bam: MUT1, type: MUT, control: WT2}
MUT2: {bam: MUT2, type: MUT, control: WT3}

What if my sample sizes for conditions weren't matched, if I have 5 in one condition and 8 in another for example?

Thanks!

PAQR just runs condition wise, so in the inference of poly(A) site usage it does not make a difference what you put as control for the mutation samples.
However, the KAPAC step needs a reference sample to compare against; so results may change depending on which of the wild type sample you use as control. That being said, it is not necessary that you have matching samples of treatment vs control.

Probably, it would even be of interest for us if you change the control samples in two independent runs and get to completely different results. We'd expect that results should be stable towards this type of alteration.

Hope this helps for now.

Best,
Ralf

Just to tag onto this issue, it appears that the sample relationships defined in the config can affect whether samples pass the mTIN > 70 filter in part_one.Snakefile. In the case below, only pairs of samples that both have mTIN > 70 are considered valid, despite many in my HOM condition having > 70 mTIN.

As I've defined the sample relationships here, only the HOM-3 : WT-3 pairing passes the filter.

bias.TIN.median_per_sample.tsv
sample median_TIN
IP-WT-D14-1 60.078931
IP-WT-D14-2 63.014136
IP-WT-D14-3 72.905163
IP-WT-D14-4 70.372223
IP-HOM-D14-1 71.532313
IP-HOM-D14-2 70.307760
IP-HOM-D14-3 74.176115
IP-HOM-D14-4 68.654441
IP-HOM-D14-5 70.127562
IP-HOM-D14-6 70.768449

(config.yaml)
IP-WT-D14-1: {bam: IP-WT-D14-1_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-2: {bam: IP-WT-D14-2_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-3: {bam: IP-WT-D14-3_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-4: {bam: IP-WT-D14-4_unique_rg_fixed, type: IP_D14_CNTRL}
IP-HOM-D14-1: {bam: IP-HOM-D14-1_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-1}
IP-HOM-D14-2: {bam: IP-HOM-D14-2_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-2}
IP-HOM-D14-3: {bam: IP-HOM-D14-3_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-3}
IP-HOM-D14-4: {bam: IP-HOM-D14-4_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-4}
IP-HOM-D14-5: {bam: IP-HOM-D14-5_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-1}
IP-HOM-D14-6: {bam: IP-HOM-D14-6_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-2}

At this stage, our main interest in this data-set is the inference of poly(A) site usage. Following on from what you've said, would you say it's acceptable to change the control samples for my HOM set so they point to the WT-3 & WT-4 samples (the WT samples are biological replicates)?

Thanks,
Sam

Hi Sam
yes, I think that is what I would suggest to do in this case.

Best
Ralf