OSError when computing transfer entropy

Question

OSError when computing transfer entropy

jonayoung2003 opened this issue 4 years ago · comments

jonayoung2003 commented 4 years ago

Hi, I'm trying to compute the transfer entropy using PyInform with background processes. The data I have is quite large (many random boolean networks with many initial conditions, trying to reproduce the results in this paper) and above a certain amount of data I get this kind of error on my Windows 10 laptop:

OSError: exception: access violation reading 0x0000025CD86EF000

and on linux (on my uni's high performance computing cluster) I get the following kind of error:

207277 Segmentation fault python inform_transfer_entropy_issue.py

I have written a bit of code to reproduce this error:

import numpy as np
from pyinform import transfer_entropy

init_conditions = [np.random.randint(0,2,(400,250)).astype(bool) for _ in range(4480)]

sources = [states[:, 1] for states in init_conditions]
recievers = [states[:, 2] for states in init_conditions]
apparent_transfer = transfer_entropy(sources, recievers, k=13)
conditions = None
other_inputs = [3, 4]
conditions = []
for other_inp in other_inputs:
    conditions.append([states[:, other_inp] for states in init_conditions])
complete_transfer = transfer_entropy(sources, recievers, k=13, condition=conditions)

Note that the error is only triggered on the line that computes the transfer entropy with conditions (the last line). Also note that I am generating (mock) data for 4480 initial conditions (the amount of initial conditions for a random boolean network in practice), but it even happens when it is just for 430 initial conditions. I have assumed this is some kind of bug. But I also want to make sure I'm going about this correctly. For the conditions argument, if I have multiple background processes and multiple initial conditions, is the third dimension of the multi-dim array the different initial conditions (like I have set it up in the code example)? Is the function simply not able to handle large amounts of initial conditions? If my usage makes sense and this is an issue with inform, is there some workaround I can use?

I would appreciate any help you can provide.

Douglas G. Moore · Answer 1 · Wed Sep 23 2020 04:09:38 GMT+0800 (China Standard Time)

Hi, @jonayoung2003 . Thanks for the awesome issue!

Just glancing at the code you shared, it doesn't look like you're doing anything that outside what we expect (Py)Inform to be able to handle. I'll need to look a little more carefully.

I can't immediately tell you what the problem is. The access violation together with a segmentation fault usually mean one of two things:

There was a problem allocating enough memory, which is possible given you're using k=13 and have 4 time series. This will depend on how much RAM you have. I'm kind of leaning away from this because failure to allocate memory is usually handled more gracefully than with a segfault.
There could be an issue with indices running over the boundaries of some array. This, I think, is most likely.

We should be able to handle this many initial conditions, so if that turns out to be the problem we'll get a remedy out as quickly as possible.

I'll dig into this this afternoon and get back to you ASAP.

Douglas G. Moore · Answer 2 · Wed Sep 23 2020 05:50:23 GMT+0800 (China Standard Time)

Hi @jonayoung2003. It looks like the issue is an indexing problem. It turns out we've already fixed it (#78). Unfortunately, we haven't rolled that change out to PyInform yet. I'll get on it, but it might take me a little while — I've got to rework our continuous integration to generate the Inform binaries used by PyInform.

If you don't want to wait, here are some instructions that ought to work for you.

Download the latest Inform binaries (https://github.com/ELIFE-ASU/Inform/releases/download/v1.0.1/inform-1.0.1_mixed.zip)
Clone PyInform
Extract the binaries into the pyinform directory of the PyInform repo. This should result in an inform-1.0.1 directory.
Reinstall PyInform

This script ought to work on Linux or macOS. Something similar would work on Windows.

git clone https://github.com/elife-asu/PyInform

cd PyInform/pyinform
wget https://github.com/ELIFE-ASU/Inform/releases/download/v1.0.1/inform-1.0.1_mixed.zip
unzip inform-1.0.1_mixed.zip
cd ..

pip3 uninstall pyinform
# --user is optional, without it you'll probably need sudo
pip3 install --user .

There are some caveats to this. You might get one of two errors: 1.) pyinform can't find the library or 2.) complaints about GLIBC (if you're on Linux or macOS). Let me know if this doesn't work and I'll walk to through a workaround for the workaround. 😄

One last thing: how many nodes are in your network, 400 or 250? From what you wrote, I'd say you have 250 nodes with 400 time steps observed for each node. The various time series functions expect rows to be initial conditions and columns to be time steps. In the case of the conditions, the array is 3-D instead of 2-D. In that case the first index is the condition, the second index is the row and the third index is the time step.

Hopefully this make sense. If not, let me know and I'll try to clarify further.

jonayoung2003 · Answer 3 · Wed Sep 23 2020 06:44:03 GMT+0800 (China Standard Time)

Thanks so much for the speedy response! Yes 250 nodes and 400 timesteps is correct. Ah yes I was mistaken when I said third dimension was initial condition, of course it is timestep. So it appears I have interpreted the argument correctly. Thanks for clarification.

So I am trying your workaround and as you predicted I got this error when trying to run the script on the high performance computing cluster (where I will be running the full experiment):

OSError: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /data/home/acw303/.local/lib/python3.7/site-packages/pyinform/inform-1.0.1/lib/linux-x86_64/libinform.so.1.0.1)

Can you walk me through the workaround workaround?

Douglas G. Moore · Answer 4 · Wed Sep 23 2020 07:17:12 GMT+0800 (China Standard Time)

Happy to help.

So, this is pretty easy to fix, if not kind of tedious. You'll need to build Inform on the compute cluster and move the generated binaries to the right place. To do this, you'll need to have CMake and some C compiler (gcc or clang out to work just fine).

Steps:

Make sure you have cmake and gcc/clang installed (almost certain given you're on a HPC cluster)
Clone Inform
Inside the cloned directory run the dist/package.sh script. This will build inform, gather all the necessary files together and tar them up. You'll find the tarball at dist/inform-1.0.1_linux-x86_64.tar.gz
Move the tarball to the pyinform directory where you extracted the other zip file.
Remove the inform-1.0.1 directory
Extract from the tarball.
Since you are on linux you'll need to...
1. cd into inform-1.0.1/lib
2. Create a directory called linux-x86_64
3. Move the all of the files in inform-1.0.1/lib into this directory
Reinstall pyinform as before

This script ought to get the job done.

#!/usr/bin/env bash

PYINFORM=/path/to/PyInform/repo
INFORM=/path/to/Inform/repo

cd $INFORM
chmod +x dist/package.sh
./dist/package.sh
mv dist/inform-1.0.1_linux-x86_64.tar.gz $PYINFORM/pyinform

cd $PYINFORM/pyinform
if [[ -d inform-1.0.1 ]]; then
    rm -r inform-1.0.1
fi
tar xzf inform-1.0.1_linux-x86_64.tar.gz
cd inform-1.0.1/lib
mkdir linux-x86_64
mv libinform* linux-x86_64

cd $PYINFORM
pip3 uninstall -y pyinform
pip3 install --user .

Let me know how it goes!

jonayoung2003 · Answer 5 · Wed Sep 23 2020 09:34:04 GMT+0800 (China Standard Time)

Hi this has worked with the example code I provided! I'm now in the process of running the full experiment. I'll let you know if there are any more issues. Thanks for all your help. I just want to add that pyinform is a real joy to work with. Very easy to understand and to get running quickly compared to other libs. Will the fix for this issue be released with the next build of pyinform?

Douglas G. Moore · Answer 6 · Wed Sep 23 2020 09:45:13 GMT+0800 (China Standard Time)

@jonayoung2003 I'm glad it's working for you, and thank you for the kind words. We don't really keep track of usage statistics, so it's hard to tell if people are actually using PyInform. I'm happy to hear someone's finding it useful and user-friendly.

This fix, and a few others, will be released in the next version of PyInform.

geofurb · Answer 7 · Wed Jul 20 2022 22:39:19 GMT+0800 (China Standard Time)

Hi @jonayoung2003. It looks like the issue is an indexing problem. It turns out we've already fixed it (#78). Unfortunately, we haven't rolled that change out to PyInform yet. I'll get on it, but it might take me a little while — I've got to rework our continuous integration to generate the Inform binaries used by PyInform.

Any plans to release this fix?

geofurb · Answer 8 · Thu Jul 21 2022 19:59:07 GMT+0800 (China Standard Time)

I'm still segfaulting after applying this patch. Plenty of memory is free; I peak at about 12-13% memory usage.