xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.

Home Page:http://www.uvm.edu/genomics/software/ERVcaller.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error of Step 3

zhutao1009 opened this issue · comments

这是我用的命令:
perl /vol6/home/quluj/zt/software/ERVcaller_v1.4/ERVcaller_v1.4.pl \
-i MD4 \
-f .fastq.gz \
-H /vol6/home/quluj/zt/pkduck_ref/PK_ref.fa \
-T /vol6/home/quluj/zt/DUCK/teseq/LTR.fasta \
-t 12 -S 20 -G
但是每次都到第三步就报错,只能得到一个空的vcf文件。
Step 3: Validation...

Converting SAM to BAM file, and then Sort and index the BAM file......

[bam_sort_core] merging from 11 files and 11 in-memory blocks...
[bwa_index] Pack FASTA... [bns_fasta2bntseq] Failed to allocate 0 bytes at bntseq.c line 303: Success
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files

Please check your indexed human and TE reference files. You can also check if the aligned BAM file is correctly indexed and sorted, if not, you can check your installed SAMtools version which should be higher than v1.5.

I runned ERVcaller with your test data in different servers, the ERVcaller reported same error and generated an empty vcf file, Maby you should check the file '${input_sampleID}_ERV.output', which didn't be generated as expected.
This is my code:
#!/bin/bash
conda activate R3 #activate the R3.3.2 envierment
perl /home/software/ERVcaller/ERVcaller_v1.4.pl \
-i TE_seq \
-I /home/software/ERVcaller/test/BWA/ \
-f .bam \
-H human.fa \
-T /home/software/ERVcaller/Database/HERVK.fa \
-t 2 -S 20 -G -BWA_MEM \
-l 500 \
-L 100

It works correctly on servers from many other users as well as mine. According to my experience, it usually caused by incorrect inputs. If it is easier for you, can you show the screenshots for: 1) a list of produced intermedia files (using command line: ls -lh); 2) your input paired-end reads (using command line cat); and 3) a list of generated index files for both human.fa and HERVK.fa (using command line: ls -lh).

I would suggest to use full paths and as less parameters as you can for now follow the manual of ERVcaller. For example:
perl /home/software/ERVcaller/ERVcaller_v1.4.pl
-i TE_seq
-I /home/software/ERVcaller/test/BWA/
-f .bam
-H **full paths/**human.fa
-T /home/software/ERVcaller/Database/HERVK.fa
-BWA_MEM

(base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman# ll
总用量 6.9G
-rwxrwxrwx 1 zhu zhu 445 9月 4 10:52 ervcaller.sh
-rwxrwxrwx 1 zhu zhu 3.1G 9月 2 17:10 human.fa
-rwxrwxrwx 1 zhu zhu 22K 9月 4 00:20 human.fa.amb
-rwxrwxrwx 1 zhu zhu 83K 9月 4 00:20 human.fa.ann
-rwxrwxrwx 1 zhu zhu 3.1G 9月 4 00:19 human.fa.bwt
-rwxrwxrwx 1 zhu zhu 781M 9月 4 00:20 human.fa.pac
-rwxrwxrwx 1 zhu zhu 11K 9月 4 00:20 nohup.out
drwxrwxrwx 1 zhu zhu 48 9月 4 12:29 TE_seq_subgenome
drwxrwxrwx 1 zhu zhu 408 9月 4 12:29 TE_seq_temp
-rwxrwxrwx 1 zhu zhu 0 9月 4 12:29 TE_seq.vcf
###############################################
(base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman/TE_seq_subgenome# ll
总用量 0
############################################################
(base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman/TE_seq_temp# ll
总用量 4.5K
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_1.1fuq
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV1.bian
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_1sf.fuq
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_2.1fuq
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.fine_mapped
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.hf
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.output
-rwxrwxrwx 1 zhu zhu 925 9月 4 12:28 TE_seq_ERV.output2
-rwxrwxrwx 1 zhu zhu 0 9月 4 12:29 TE_seq_ERV.output2.1
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.TE_f
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.TE_f2
-rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.visualization
-rwxrwxrwx 1 zhu zhu 32 9月 4 10:54 TE_seq.type.gz
############################################
(base) root@zhu-PC:/home/software/ERVcaller/Database# ll
总用量 976K
-rw-r--r-- 1 root root 814K 9月 1 16:37 ERV_library.fa
-rw-r--r-- 1 root root 8.2K 9月 1 16:37 HERVK.fa
-rw-r--r-- 1 root root 9 9月 2 17:20 HERVK.fa.amb
-rw-r--r-- 1 root root 42 9月 2 17:20 HERVK.fa.ann
-rw-r--r-- 1 root root 8.1K 9月 2 17:20 HERVK.fa.bwt
-rw-r--r-- 1 root root 2.0K 9月 2 17:20 HERVK.fa.pac
-rw-r--r-- 1 root root 4.1K 9月 2 17:20 HERVK.fa.sa
-rw-r--r-- 1 root root 115K 9月 1 16:37 Human_TE_library.fa
#####################################################
(base) root@zhu-PC:/home/software/ERVcaller/test/BWA# ll
总用量 15M
-rw-r--r-- 1 root root 13M 9月 1 16:37 TE_seq.bam
-rw-r--r-- 1 root root 1.2M 9月 1 16:37 TE_seq.bam.bai

It looks like it stopped very early due to the alignment or format conversion steps, which mainly use BWA and SAMtools.

Can you specify the full path for your indexed human reference genome and try again? It guess it should be here /media/zhu/A64E22B94E228263/clean/humman/ on your PC.

Let me know if it is not working. With attached log file and similar screenshot.

It's worked with your test data and part of mine, here is the output and the TSD sequence is too long
MDM1.txt

That sounds good that the ERVcaller works. ERVcaller outputs all results, and the users can filter the results based on the reported genotype quality and likelihood. It is very useful when the users are working on a population. For the TSD sequence, based on the literature, long TSD existed. we currently use 500 bp for now to keep high sensitivity and accuracy. we may improve it in our next version.

Thank you for your work, you helped me a lot.