liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TRUST 4 barcode parameter for the bam files

Antibuy opened this issue · comments

Dear TRUST teams,

I am assembling TCR&BCR from bulk 10X scRNA data.

TRUST4 runs well with the parameter of "-1 read1.fq.gz -2 read2.fq.gz --barcode read1.fq.gz --read-format bc:0:15,r1:16:-1" if fq files as input; but I am a little confused with the barcode parameter for the bam file as input, and there is only a brief description in TRUST4: you can use "--barcode" to specify the field in the BAM file to specify the barcode: e.g. "--barcode CB", but --barcode CB did not work.

Now, I have customed bam files where barcode information included and the read1.fq.gz containing the barcode, do you have any idea about how to specify the barcode when running the bam files? or could you please introduce more about the --barcode parameters for bam files?

Looking forward to your reply!

Thanks

The "--barcode" parameter looks for the optional fields (typical ones like NM:i.., MD:z:...) in the BAM/SAM format. In 10x genomics data, the barcode for the read is in the CB field. Do you know which field contains the barcode in your pipeline?

Im my bam files, the barcodes are located on "B0:Z:ACACAGAAAGTACAAG", does that means I should use the parameter "--barcode B0"?

Here is an example of my bam file:

A00742:661:H75MGDSX7:1:1133:10194:18646 16 1 10001 255 12S109M26S * 0 0 TATCCCTAACCCTATCCCTAACCCTAACCCTCACCCTTCCCCTATCCCTAACCCTAACCCTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCATTCACTCTGCGTTGATACCACTGCTT F:,,F:F,,F,FFF,,::,:F,:F:,:,F:F,:F,FF,,,,F:F,,::F,:F,,FFF,:,F,,,F::F:,F,F:,FFFF,,,F,F,F::FFF:FFF::,FFF,:F:F:,:FF:F,:FF::F,FFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:95 nM:i:6 B0:Z:ACTATGCAAACAACCA B3:Z:CGTGTTGAGT RG:Z:0
A00742:661:H75MGDSX7:1:1369:23222:19163 16 1 10001 255 113M35S * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCATTCACTCTGCGTTGATACCACTGCTTT FFF:FFFFFFFFFFFFFFFFFFFFF:FFFF:F,FFFFFFFFFFF:FFFFFFFFFF::FFF:F:FFFFF,FFFF:FFFFFF,FFFFF:FFFF:FFFF,,FFFFFF,FFF,F,FFFFF,FFF:FFFFFFFFFFFFFFFFFFFFFFFFF:F NH:i:1 HI:i:1 AS:i:107 nM:i:2 B0:Z:CTGTAGCCAACGCTTA B3:Z:AAAGAAAATA RG:Z:0
A00742:661:H75MGDSX7:1:1560:22318:17190 16 1 10001 255 113M34S * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCATTCACTCTGCGTTGATACCACTGCTT FFFFFFF:FFFF:,,FFF,,FFFFF,FFFF:,FFFF::FFFF:F,FFFF:,FFFFF,FFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFF:F,FFFF,,FFF:F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:107 nM:i:2 B0:Z:ACACAGAAAGTACAAG B3:Z:TTGTTTAGTA RG:Z:0
A00742:661:H75MGDSX7:1:2237:10312:29841 16 1 10001 255 113M34S * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCATTCACTCTGCGTTGATACCACTGCTT FFFFFFFFFFFFF:FFFFFF,FFF:,FFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFF:FFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:107 nM:i:2 B0:Z:CAAGACTAACACAGAA B3:Z:TACAATGTTC RG:Z:0

Yes, you shall use B0 for the --barcode option.

Awesome! it works now, appreciate it.

Hello,
I had my 10X 3' data with cell ranger arc and I am using the bam file as input for TRUST4. Can you point out what is the barcode in this bam file? it's not CB nor B0

:~/data/sc_counts/MM1_counts/outs$ samtools view possorted_genome_bam.bam | head -n 2 VH00461:12:AAAHWJNHV:2:1109:12134:36477 16 chr1 10017 1 90M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACACTAAC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-C NH:i:3 HI:i:1 AS:i:86 nM:i:1 RG:Z:MM_no_HR_937961_counts:0:1:AAAHWJNHV:2 RE:A:I xf:i:0 CR:Z:AACCTAACCATCTAAC CY:Z:CCCCCCCCCCCCCCCC UR:Z:CTAACCCTAACC UY:Z:CCCCCCCCCCCC UB:Z:CTAACCCTAACC VH00461:12:AAAHWJNHV:1:1205:42885:1530 0 chr1 10022 1 5S85M * 0 0 GACAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;C;CCCCCCCCCCC;CCCCCCCCCCCCCCCCC--CCCCCCC-CCCCCCCC-CCC--CC NH:i:4 HI:i:1 AS:i:83 nM:i:0 RG:Z:MM_no_HR_937961_counts:0:1:AAAHWJNHV:1 RE:A:I xf:i:0 CR:Z:CCTTTAGGTAAACTTT CY:Z:CCCC;;CCC-;-CCC- UR:Z:AAGGGTAATATA UY:Z:C-CCC;;CCCC- UB:Z:AAGGGTAATATA

Hi @RaghadShu , is your bam file from 10x cellranger? They usually put the CB in the field as the corrected barcode. I guess for the entries you show, their barcodes could not be corrected, so you don't see the CB:Z:XXX field. If you check more alignments, probably you will see them.

Hi @mourisl, yes it's from cellranger ARC. Seems like you're right, I did head n -10 and it showed the CB:Z:xxxx field.
Also a trivial issue was that I was running with -barcode rather than --barcode. Seems to be recognizing the CB tags now. Many thanks!!!