YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in rule get_nt_coverage:

byee4 opened this issue · comments

I'm having an issue that is causing an error in Skipper on mice data, but I don't know if the mice annotations are ultimately causing the overflow. Is there anything obviously wrong with the command or the annotations?

zcat output/reproducible_enriched_windows/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.reproducible_enriched_windows.tsv.gz | tail -n +2 | sort -k1,1 -k2,2n | awk -v OFS="  " '{print $1, $2 -37, $3+37,$4,$5,$6}' | bedtools merge -i - -s -c 6 -o distinct | awk -v OFS=" " '{for(i=$2;i< $3;i++) {print $1,i,i+1,"MW:" NR ":" i - $2,0,$4, NR} }' > output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed; samtools cat output/bams/dedup/genome/CA3_IN_1.genome.Aligned.sort.dedup.bam output/bams/dedup/genome/CA3_IN_2.genome.Aligned.sort.dedup.bam | bedtools intersect -s -wa -a - -b output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed | bedtools bamtobed -i - | awk '($1 != "chrEBV") && ($4 !~ "/2$")' | bedtools flank -s -l 1 -r 0 -g /tscc/projects/ps-yeolab3/bay001/annotations/mm10/star_2_7_6a_gencode25_sjdb/chrNameLength.txt -i - | bedtools shift -p 1 -m -1 -g /tscc/projects/ps-yeolab3/bay001/annotations/mm10/star_2_7_6a_gencode25_sjdb/chrNameLength.txt -i - | bedtools sort -i - | bedtools coverage -counts -s -a output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed -b - | awk '{print $NF}' > output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_coverage.input.counts;samtools cat output/bams/dedup/genome/CA3_IP_1.genome.Aligned.sort.dedup.bam output/bams/dedup/genome/CA3_IP_2.genome.Aligned.sort.dedup.bam | bedtools intersect -s -wa -a - -b output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed | bedtools bamtobed -i - | awk '($1 != "chrEBV") && ($4 !~ "/2$")' | bedtools flank -s -l 1 -r 0 -g /tscc/projects/ps-yeolab3/bay001/annotations/mm10/star_2_7_6a_gencode25_sjdb/chrNameLength.txt -i - | bedtools shift -p 1 -m -1 -g /tscc/projects/ps-yeolab3/bay001/annotations/mm10/star_2_7_6a_gencode25_sjdb/chrNameLength.txt -i - | bedtools sort -i - | bedtools coverage -counts -s -a output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed -b - | awk '{print $NF}' > output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_coverage.clip.counts;paste output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_census.bed output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_coverage.input.counts output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_coverage.clip.counts > output/finemapping/nt_coverage/x_eCLIP_CA3-26-02-2024-20-42-57-29-02-2024-21-51-06.nt_coverage.bed
Activating singularity image /tscc/projects/ps-yeolab4/software/skipper/d0055ff/singularity/3a69c84662a103b04ab9cb379236f2d6.simg
Error: Invalid record in file -. Record is 
chrM    -37     105     12146428        0       +

Here is the first few lines of the bam files:

[bay001@login1 x_mouse_hippocampus_29-02-2024-21-52-22]$ samtools view output/bams/dedup/genome/CA3_IN_1.genome.Aligned.sort.dedup.bam | grep chrM | less
VH01429:45:AACHKTTHV:1:2507:34099:18228:AGCGCACTTA      16      chrM    6       255     31M     *       0       0       TGTAGCTTAATAACAAAGCAAAGCACTGAAA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:1  HI:i:1  AS:i:30 nM:i:0  NM:i:0  MD:Z:31 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:2601:21545:10674:AAAGGCAGGG      0       chrM    10      255     41M     *       0       0       GCTTAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGAT       CCCCCCCCCCCC;CCCCCCCCCCCCCC;CCCCCCC;CCCC;       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:1406:59151:53687:GCTTGATATC      0       chrM    14      255     37M     *       0       0       AATAACAAAGCAAAGCACTGAAAATGCTTAGGGATAA   CCC-CCCC-CCCCC;CCC;C;CCCC-C-CCCC-C;-C   NH:i:1  HI:i:1  AS:i:26 nM:i:5  NM:i:5  MD:Z:31A0T0G0G1T0       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:2211:30293:52968:AGCTCTCCAT      0       chrM    17      255     41M     *       0       0       AACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTA       CCCCCCC-CC;-CCC-CCCCCCCCCCCC;;CCCCCCCCCC;       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:2302:52694:24741:ACAATTATCG      0       chrM    18      255     41M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTAT       CCCC;CCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:2614:69357:10731:TCCCATCTAT      0       chrM    18      255     34M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATA      CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC      NH:i:1  HI:i:1  AS:i:33 nM:i:0  NM:i:0  MD:Z:34 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:1205:26961:50280:GGAGTTTCAC      0       chrM    18      255     41M     *       0       0       ACAAAGCAAAGCACTGAATATGCTTAGATGGATAATTGTAT       CCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:38 nM:i:1  NM:i:1  MD:Z:18A22      jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:1210:40462:16941:GGTGGACCAC      0       chrM    18      255     41M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTAT       CCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:1506:73902:28678:GATTTAACTC      0       chrM    18      255     34M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATA      CCCC-CCCC;CCCCCCCCCCCCCCCCCCCCCCCC      NH:i:1  HI:i:1  AS:i:33 nM:i:0  NM:i:0  MD:Z:34 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:2506:46350:14120:GTGGAGTTGT      0       chrM    20      255     41M     *       0       0       AAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCC       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:1213:31941:18039:GGCCACCTGG      0       chrM    22      255     41M     *       0       0       AGCAAAGCACTGAATATGCTTAGATGGATAATTGTATCCCA       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCC       NH:i:1  HI:i:1  AS:i:38 nM:i:1  NM:i:1  MD:Z:14A26      jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:2514:49418:9463:CCCAACAAAG       0       chrM    24      255     41M     *       0       0       CAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATA       CCC;CCC;CC;CCCCCCCCCCCCCCCCCCCCCCCCC-CCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:1302:10600:34755:GATACATAAC      0       chrM    34      255     37M     *       0       0       AAAATGCTTAGATGGATAATTGTATCCCATAAACACC   CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCC   NH:i:1  HI:i:1  AS:i:34 nM:i:1  NM:i:1  MD:Z:36A0       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:2304:62446:34869:AGTGTAGATG      0       chrM    34      255     38M     *       0       0       AAAATGCTTAGATGGATAATTGTATCCCATAAACACCA  CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CC;CC  NH:i:1  HI:i:1  AS:i:35 nM:i:1  NM:i:1  MD:Z:36A1       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:2:1606:60230:37614:GGCCCCTCAG      0       chrM    35      255     36M     *       0       0       AAATGCTTAGATGGATAATTGTATCCCATAAACACC    CCCCCC-CC;;CC;CCCC;CCCCCCCC-CCC-CCCC    NH:i:1  HI:i:1  AS:i:33 nM:i:1  NM:i:1  MD:Z:35A0       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
VH01429:45:AACHKTTHV:1:1306:9540:25744:AAGCATAAAC       0       chrM    36      255     36M     *       0       0       AATGCTTAGATGGATAATTGTATCCCATAAACACCA    CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC    NH:i:1  HI:i:1  AS:i:33 nM:i:1  NM:i:1  MD:Z:34A1       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_1
...
[bay001@login1 x_mouse_hippocampus_29-02-2024-21-52-22]$ samtools view output/bams/dedup/genome/CA3_IN_2.genome.Aligned.sort.dedup.bam | grep chrM | less
VH01429:45:AACHKTTHV:1:2210:56235:44070:AATACCCAGT      0       chrM    14      255     41M     *       0       0       AATAACAACGCAAAGCACTGAAAATGCTTAGATGGATAATT       CCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CC       NH:i:1  HI:i:1  AS:i:38 nM:i:1  NM:i:1  MD:Z:8A32       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:2505:24442:15085:GTAATGCATA      0       chrM    14      255     41M     *       0       0       AATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATT       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:1314:21091:42328:GAATATTAGT      0       chrM    16      255     41M     *       0       0       TAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGT       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:2204:70001:7550:CACAAGGCCA       0       chrM    16      255     41M     *       0       0       TAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGT       C;CCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:1:2109:16716:21446:TTCTTCGAGG      0       chrM    17      255     41M     *       0       0       AACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTA       CC;CCCCCCCCC;CCCCCCCCCC;CCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:1:2606:62919:8648:ATGGTATCCG       0       chrM    17      255     35M     *       0       0       AACAAAGCAAAGCACTGAAAATGCTTAGATGGATA     CCCCCCCCCC;CCCC;CCCCCCCCCCCCCCCCCCC     NH:i:1  HI:i:1  AS:i:34 nM:i:0  NM:i:0  MD:Z:35 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:1:1205:17985:41230:TTAACACACA      0       chrM    18      255     34M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATA      CCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCC      NH:i:1  HI:i:1  AS:i:33 nM:i:0  NM:i:0  MD:Z:34 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:1:2113:12399:44316:GATACAATAC      0       chrM    18      255     41M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTAT       CCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:1604:42961:53252:GGTGTGTGAA      0       chrM    18      255     41M     *       0       0       ACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTAT       CCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:2207:43643:14934:TGCGGAGCAG      0       chrM    18      0       16M     *       0       0       ACAAAGCAAAGCACTG        CCCC;CCCCCCC-C;C        NH:i:7  HI:i:1  AS:i:15 nM:i:0  NM:i:0  MD:Z:16 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:2202:27396:30799:GTCAGTACGC      0       chrM    21      255     41M     *       0       0       AAGCAAAGCACTGAAAATGCTTAGATGGATAANTGTATCCC       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC#CCCCCCCC       NH:i:1  HI:i:1  AS:i:39 nM:i:0  NM:i:1  MD:Z:32T8       jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:1:2508:62332:19156:GGAAACAACT      16      chrM    29      255     41M     *       0       0       CACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACAC       CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC       NH:i:1  HI:i:1  AS:i:40 nM:i:0  NM:i:0  MD:Z:41 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2
VH01429:45:AACHKTTHV:2:1508:56216:10504:AATGCCATAC      0       chrM    33      255     19M     *       0       0       GAAAATGCTTAGATGGATA     CCCCCCCCCCCC-CCCCCC     NH:i:1  HI:i:1  AS:i:18 nM:i:0  NM:i:0  MD:Z:19 jM:B:c,-1       jI:B:i,-1       RG:Z:CA3_IN_2

contents of chrom-sizes:

chr1    195471971
chr10   130694993
chr11   122082543
chr12   120129022
chr13   120421639
chr14   124902244
chr15   104043685
chr16   98207768
chr17   94987271
chr18   90702639
chr19   61431566
chr1_GL456210_random    169725
chr1_GL456211_random    241735
chr1_GL456212_random    153618
chr1_GL456213_random    39340
chr1_GL456221_random    206961
chr2    182113224
chr3    160039680
chr4    156508116
chr4_GL456216_random    66673
chr4_JH584292_random    14945
chr4_GL456350_random    227966
chr4_JH584293_random    207968
chr4_JH584294_random    191905
chr4_JH584295_random    1976
chr5    151834684
chr5_JH584296_random    199368
chr5_JH584297_random    205776
chr5_JH584298_random    184189
chr5_GL456354_random    195993
chr5_JH584299_random    953012
chr6    149736546
chr7    145441459
chr7_GL456219_random    175968
chr8    129401213
chr9    124595110
chrM    16299
chrX    171031299
chrX_GL456233_random    336933
chrY    91744698
chrY_JH584300_random    182347
chrY_JH584301_random    259875
chrY_JH584302_random    155838
chrY_JH584303_random    158099
chrUn_GL456239  40056
chrUn_GL456367  42057
chrUn_GL456378  31602
chrUn_GL456381  25871
chrUn_GL456382  23158
chrUn_GL456383  38659
chrUn_GL456385  35240
chrUn_GL456390  24668
chrUn_GL456392  23629
chrUn_GL456393  55711
chrUn_GL456394  24323
chrUn_GL456359  22974
chrUn_GL456360  31704
chrUn_GL456396  21240
chrUn_GL456372  28664
chrUn_GL456387  24685
chrUn_GL456389  28772
chrUn_GL456370  26764
chrUn_GL456379  72385
chrUn_GL456366  47073
chrUn_GL456368  20208
chrUn_JH584304  114452

There doesn't appear to be any negative start coords in the bam files (these are the output/bams/dedup/genome/*genome.Aligned.sorted.dedup.bam files).

My best guess is that in the above command, bedtools shift or bedtools flank are doing something strange, but I can't really tell what it's doing. The commands themselves shouldn't be causing the overflow, as I chrom.sizes is specified, so I'm also confused why I see negative values here.

Thank you! This makes sense. If you want to update Github I can pull the changes into whatever branch we're currently working off of.