xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.

Home Page:http://www.uvm.edu/genomics/software/ERVcaller.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[E::bwa_idx_load_from_disk] fail to locate the index files

xxYaaoo opened this issue · comments

Hello, Dr. Chen~! I met a problem when I tried to run test data. The .err log showed me the detailed infor below:
image
and here was my command line:
image
how could I solve these problems?
thank you for your help!

Hi,

If you correctly indexed the human genome and TE consensus sequences, the error from the very early bwa alignment step may be because no chimeric/split reads were identified.

may I get a list of intermediate output files through "ls -lh" under your output folder?

Best,
Xun

Thank you for your reply!
My output folder:
image
Besides, I am wondering....what you mean "correctly indexed the human genome and TE consensus sequences"....
Is there any step needed to be done about the human genome and TE consensus sequences before running the ERVcaller_v1.4.pl?

yes, when you prepared the reference files before running ERVcaller.pl, you need to run the "bwa index" command for both human and TEs. You could try it first and then let me know if you still have the same issue.

Best,
Xun

Hi, Dr. Chen! I think I forgot to "bwa index hg38.fa" previously. After running this command, I ran the ERVcaller command line again.
This is my output folder: (while the .vcf file is still empty...). I feel there might still have some problems...
image
image
Thank you for your help!

Hi,

Have you also indexed your TE consensus sequences? Can you also share the log file and the file sizes under the temp folder?

Xun

Yes, I have checked my notes that I had indexed the TE consensus sequences before.
image
This is my temp folder:
image
My slurm.out file:
image
Head part of my slurm.err file:
image
Thank you so much!!!

I can't find any problem with your log and temp files.

Could you try using the BAM file or the TE_seq fastq file as the inputs? (not TE_seq2 which may not contain simulated insertions and used for testing separate FASTQ inputs)

Best,
Xun

YEAH, Dr. Chen!~ I used the BAM file and it seemed like success?!
image
And I am curious that the slurm.err file will not be empty, even if the running is successful?
(I will try to figure out the problems related with .fq.gz files and further run my own data.

Hi,

I am glad that it works!

I don't know what is included in your slurm.err file, but sometimes it is just the log BWA or samtools running which should be fine.

Sure, let me know if you have other questions. As I suggested, TE_seq files contacted the simulated integration sites but TE_seq2 may not, which could be the potential issue.

Best,
Xun

Dear Dr. Chen,

I have run my own data successfully using ERVcaller! You really make my these days! Thank you so much! Appreciate~

Best wishes,
Yaaoo