jeffdaily / parasail

Pairwise Sequence Alignment Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attempt to to align an empty string appears to cause a Floating point exception

rsharris opened this issue · comments

I'm posting this so that others that encounter a "Floating point Exception" will understand that it may mean their input sequences with empty. I wouldn't say that it needs to be fixed, just that users will want to be aware.

I am using sg_trace_striped_16 and in some cases I was passing a pair of empty strings to it. And my program halted with Floating point exception.

Specifically, in an interactive python3 shell, (Python 3.7.3, parasail v2.4.3, parasail-bindings v1.2.3)

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("","",11,1,subsMatrix)

results in the message "Floating point exception" and kicks me out of python back to bash.

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("","AAAA",11,1,subsMatrix)

had the same failure.

But

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("AAAA","",11,1,subsMatrix)

gave me this error instead:

python3: src/memory.c:245: parasail_result_new_trace: Assertion `b > 0' failed.
Aborted

Obviously the simplest work around is for me to make sure I don't try to align empty sequences. What I'm trying to do is I have a different process (using winnowing) that identifies probable homologous segment pairs, each segment having an anchor point that I want to include in the final alignment. (similar to figure 7 here: http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.03.html#stage_gapped). I am using parasail to align in both directions from the anchor point. The problem arises when my anchor is at either end of the interval.

Thanks for the report. No spoilers, but I'm finding some time to work on a different user request, and a general lack of verifying safe inputs was brought to my attention. Not only that, but the use of assert as a way of checking inputs to functions is bad practice, and as you've observed, causes python to abort. In addition to replacing assert with more friendly alternatives, I'm adding more input-checking to the C library. But it sounds like I should do the same for the python bindings.

Curious why you're using the v1 python bindings. Are you finding a reason to use an older C library with your python bindings?

I wasn't really offended (for lack of a better word) by it exiting python. I presume the pythonic solution would be to raise an exception so the caller could trap/handle that if they choose. I was a little baffled by it being a "floating point" exception, as I expected striped_16 would be using 16-bit ints.

As to why I'm using "the v1 python bindings", it has to simply be ignorance, unaware of which I'm using and what else might exist. I cloned parasail v2.4.3 since that's the latest tagged release in the repo. I built it using "autoreconf -fi" and so on. I installed parasail-python using "python3 -m pip install --user parasail" and, inspecting the log of that I see that it installed v1.2.3, which is the latest tagged release in the repo. I think I'm using the alignment calls that are described in the readme.

Should I have installed something else?

I think the floating point error might have been a generic divide by zero error. Not 100% sure.

I haven't done a two-step build in quite some time, building the C library first and then building the python bindings. Embarrassingly, I can't remember what the python project will do -- find the system-installed libparasail.so? Or re-download and build the C lib master?

The v1 python bindings are for version 1.x of the C library. Pretty much obsolete at this point, but I keep it around for backwards-compatibility reasons. If I were to add any input validation, it would be in the v2 bindings only. But I would assume most everyone these days is using that by default anyhow. The version check and bindings import happens at runtime in the parasail/__init__.py.

The versioning between the two projects is out of sync, intentionally, but it is confusing. This allows the two projects to develop independently. I try to follow semantic versioning in both cases, and for the C library, appropriate libtool versioning.

This has to be ignorance on my part. How would I install this as a one-step build and get the v2 bindings?

Basically I stopped reading the parasail-python installation instructions once I got to "Using pip" and had success installing it that way. I didn't read "building from source" and skipped ahead to "quick example."

I failed to ask which OS you're using. Windows, Mac, or Linux? I generally have the most experience on Linux.

For Mac and Linux, pip install parasail should work to get the latest python bindings and the latest underlying C library.

Any easy way to confirm if you're using v1 or v2 bindings? Print the version of the C library from python. If it is (2,x,x), you're using v2 bindings.

~/parasail-python$ python -c 'import parasail; print(parasail.version())'
(2, 4, 3)

Don't confuse this with the version of the python bindings. That would instead be parasail.__version__.

On Linux, a more advanced use case is when someone has already installed the C library to a system location like /usr/local. In that case, if you set the environment variable PARASAIL_SKIP_BUILD it will just install the python bindings without (re)building C parasail. In that case, at run time parasail-python will search your system locations for libparasail.so.

I should have mentioned the OS. It's some brand of linux, I'm not sure which at the moment.

I actually tried to install on 4 different machines, and failed on 3. The first two were desktop macs here at home (the autoconf suite has been hit or miss for me on my macs, probably I don't have that suite installed right). The third was a server in our lab on campus, running linux of some sort. The fourth is a different server in our lab.

Looks like I am using v2 bindings:

python3
Python 3.7.3 (default, Jul 25 2020, 13:03:44) 
[GCC 8.3.0] on linux
>>> import parasail
>>> parasail.version()
(2, 4, 3)
>>> parasail.__version__
'1.2.3'

(At some point I will get back to trying an install on the mac under conda. Currently I've got what (I guess) is a working install on one machine, and it's more important to me to get up to speed using that than to get it working on my desktop.)