holtjma / fmlrc

a long-read error correction tool using the multi-string Burrows Wheeler Transform

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault: 11

yukicyk opened this issue · comments

commented

Hi there,

I got the following error message after running, how should i deal with it?

loaded bwt with 87963178 compressed values Segmentation fault: 11

Thank you!

yukicyk,

Sorry you're having this problem, can you provide the responses to the following questions so we can try to diagnose what may be happening?

  1. Can you provide the command you used to run fmlrc?

  2. It's possible that something went wrong during BWT construction. Can you post how you constructed your BWT beforehand? Additionally, posting the output of the following command will let me know if something is wrong with the BWT header:

    hexdump -C comp_msbwt.npy | head -n 10

  3. This is a bit of a hack, but can you try setting fmlrc to skip 1 read in your input FASTA file? You do this by including "-b 1" before any of the required files. If you can post the output from that command, I'll be able to verify whether fmlrc is having BWT loading issues or if it's something else.

Thanks!

commented

Dear holtjma,

Thank you for your quick reply and help, really appreciate it.

  1. Can you provide the command you used to run fmlrc?

./fmlrc 835.npy -p 4 835.all.fa 835.all.fmlrc.fa
Indeed I had successfully ran fmlrc using the same BWT file before. The problem occurred when I ran it second time with a much larger long reads file.

  1. It's possible that something went wrong during BWT construction. Can you post how you constructed your BWT beforehand?

I constructed the BWT using the msbwt package (msbwt-0.3.0 pysam-0.9.1.4) using the command as followed:
msbwt cffq --uniform --compressed ./835bwt 835_1.fq 835_2.fq

Additionally, posting the output of the following command will let me know if something is wrong with the BWT header: hexdump -C comp_msbwt.npy | head -n 10

Here is the output:
00000000 93 4e 55 4d 50 59 01 00 46 00 7b 27 64 65 73 63 |.NUMPY..F.{'desc| 00000010 72 27 3a 20 27 7c 75 31 27 2c 20 27 66 6f 72 74 |r': '|u1', 'fort| 00000020 72 61 6e 5f 6f 72 64 65 72 27 3a 20 46 61 6c 73 |ran_order': Fals| 00000030 65 2c 20 27 73 68 61 70 65 27 3a 20 28 38 37 39 |e, 'shape': (879| 00000040 36 33 31 37 38 4c 2c 29 2c 20 7d 20 20 20 20 0a |63178L,), } .| 00000050 89 19 0a 31 0b 21 0a 21 0d 39 0a 39 0d 19 0a 61 |...1.!.!.9.9...a| 00000060 0d 29 0d 11 0b 09 0d 11 0d 59 0d 39 0d 0a 11 0d |.).......Y.9....| 00000070 09 15 19 0d 39 0d 61 0a 51 1d 21 0d 19 0d 11 0d |....9.a.Q.!.....| 00000080 51 0d 09 0d 09 0b 51 0a 11 0b 09 0a 09 0d 11 0d |Q.....Q.........| 00000090 09 0d 29 0d 49 15 0a 15 11 15 19 1d 41 0d 29 0d |..).I.......A.).|

  1. This is a bit of a hack, but can you try setting fmlrc to skip 1 read in your input FASTA file? You do this by including "-b 1" before any of the required files. If you can post the output from that command, I'll be able to verify whether fmlrc is having BWT loading issues or if it's something else.

Here's the output from the command:
OBs-iMac:fmlrc-0.1.2 OB$ ./fmlrc 835.npy -p 4 -b 1 835.all.fa 835.all.fmlrc.fa loaded bwt with 87963178 compressed values Finished processing reads [0, 0)

Now the Segmentation fault error is gone, but the run finished instantly without the output file and generated an empty file called "4".

Thank you so much!

Cheers,
yuki

yukicyk,

It looks like you might have some ordering incorrect with you command line. The normal usage for fmlrc is as follows:

fmlrc [options] <comp_msbwt.npy> <long_reads.fa> <corrected_reads.fa>

So try running your command with all options before all required inputs as follow:

./fmlrc -p 4 835.npy 835.all.fa 835.all.fmlrc.fa

I think this is likely the issue, but if not let me know and I'll try to help out further.

commented

@holtjma,
Opps, the error still occur:
OBs-iMac:fmlrc-0.1.2 OB$ ./fmlrc -p 4 835.npy 835.all.fa 835.all.fmlrc.fa loaded bwt with 87963178 compressed values Segmentation fault: 11

@yukicyk

So since you successfully ran fmlrc with this BWT before and your command line options are in order now, I suspect that something unexpected is happening with the input long read fasta file. This could unfortunately be a number of things and it is hard to debug without looking at the actual file. Given that, here are some possible things that might create a seg fault in fmlrc:

  1. non-standard fasta format; the code always expects "labels" to start on a new line with '>' and all other lines are "sequences"
  2. the only supported characters in the sequences are [A, C, G, N, T]. Generally, unsupported characters just get replaced with N, but characters like "$" will actually cause a seg fault like you're seeing because the generic BWT implementation supports that character

If nothing obvious pops out, you may want to see if you can get a subset of the reads to run by using parameters like "-b" and "-e" to restrict the program input or create a smaller fasta file to see if you can identify an issue. Alternatively, if you want me to look at the fasta file, I'm willing to do that as well.

commented

@holtjma
Yes the problem is caused by the fasta file. I had been using the fastq file as input. Once I changed it to the fasta format, it ran smoothly and finished with a proper output file.

Thank you for going through this with me.

cheers
yuki