suny-downstate-medical-center / netpyne

A Python package to facilitate the development, parallel simulation, optimization and analysis of multiscale biological neuronal networks in NEURON.

Home Page:http://www.netpyne.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NetPyNe with Optuna batch - mpiexec not starting nrniv [Bug report]

samnemo opened this issue · comments

Describe the bug

When I run an Optuna batch optimization with the A1 model, mpiexec has trouble running the nrniv processes for the simulation. NetPyNe doesn't check the return calls from subprocess Popen and then waits indefinitely since the output is never produced. It seems that nrniv processes might get started but are put to sleep immediately.

This is using conda on Ubuntu with following relevant packages:

python
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import netpyne
netpyne.version
'1.0.5'
import neuron
neuron.version
'8.2.3'
import optuna
optuna.version
'3.4.0'

Reproducing the bug

Steps to reproduce the behavior:
Go to the A1 repo/branch here:
https://github.com/NathanKlineInstitute/A1/tree/samn

Then run
python batch.py

Expected behavior

I expected the mpiexec process to start nrniv properly, but nrniv fails to start. Running the mpiexec command directly runs simulations properly, but once using batch.py/NetPyNe batch with Optuna, nrniv does not start properly.

System information

See above

Additional context

Check with samn or James C for more details on reproducing the bug

and this is the mpi version:
mpiexec --version
mpiexec (OpenRTE) 4.0.2

in optuna_parallel.py the nrniv jobs seemed to be going to sleep/getting suspended
putting a quit in the right place seemed to allow the later mpiexec with nrniv processes start properly

jobString = f"""#!/bin/bash
echo '{paramLabels}'
echo '{candidate}'
nrniv -python -c 'from neuron import h;soma = h.Section(name="soma");h.psection();quit()'
echo $?
mpiexec -n 48 nrniv -python -c 'from neuron import h;h.nrnmpi_init();pc=h.ParallelContext();print(pc.id())'
echo $?

{command}    
"""