liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with Running BuildDatabaseFa.pl Script for TCR and BCR Analysis in Vicugna pacos (Alpaca)

bigcat1001 opened this issue · comments

I am attempting to use the BuildDatabaseFa.pl script from Trust4 to analyze T-cell and B-cell receptor sequences in Vicugna pacos (alpaca). I have encountered an error during the script execution, and I am seeking assistance in resolving it.
1.I ran BuildImgtAnnot.pl and get "bcr_tcr_gene_name.txt",it looks like:
IGHA
IGHD1
...
IGHV4S7
IGHV4S8
IGHV4S9
2. I download reference.fa and grf from ensembl: https://useast.ensembl.org/info/data/ftp/index.html
3. I ran BuildImgtAnnot.pl,however,it reported
No transcript_nameGeneScaffold_89 ensembl exon 536723 536788 . - . gene_id "ENSVPAG00000000584"; gene_version "1"; transcript_id "ENSVPAT00000000584"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSVPAE00000006748"; exon_version "1"; tag "Ensembl_canonical";
I suppose the format or content of the GTF file might be incompatible with the script's requirements?Besides, I am facing an issue that gene names in bcr_tcr_gene_name.txt are not found in the GTF file I am using (Vicugna_pacos.vicPac1.110.gtf). Does this mean I have to manually create the bcrtcr.fa file?

If your input data to TRUST4 is fastq file, you can directly use the fasta file created by the BuildImgtAnnot.pl script as the input for both "-f" and "--ref" option. The "BuildDatabaseFasta.pl" is mainly to create the file that is required for BAM input. I will clarify this in README later.

Thanks,I will try