OPERA 2.6 command line parallel version crashing
rvaidya opened this issue · comments
Hi Kamel,
The command line parallel version of OPERA 2.6 is crashing for an input that works with the normal version:
INFO: Adding explicit H false
Aug 22, 2020 6:14:08 AM net.guha.apps.cdkdesc.CDKdescBatch batchDescriptor
INFO: Will evaluate 50 descriptors
Aug 22, 2020 6:14:08 AM net.guha.apps.cdkdesc.CDKdescBatch batchDescriptor
INFO: Got 50 descriptor instances
Exception in thread "main" java.lang.NullPointerException
at net.guha.apps.cdkdesc.CDKDescUtils.isSMILESFormat(CDKDescUtils.java:74)
at net.guha.apps.cdkdesc.CDKdescBatch.batchDescriptor(CDKdescBatch.java:227)
at net.guha.apps.cdkdesc.CDKdesc.main(CDKdesc.java:510)
Aug 22, 2020 6:14:08 AM net.guha.apps.cdkdesc.CDKdescBatch batchDescriptor
INFO: output: CDKtemp/CDKDesc_8_temp.csv
Aug 22, 2020 6:14:08 AM net.guha.apps.cdkdesc.CDKdescBatch batchDescriptor
INFO: type: all
Aug 22, 2020 6:14:08 AM net.guha.apps.cdkdesc.CDKdescBatch batchDescriptor
From using CDK, I know that it is not thread safe - does it need a lock around CDK functionality?
Actually, I'm getting crashes sometimes in the standard version too, but during PaDEL descriptor calculation:
PaDEL calculating 2D descriptors...
Exception in thread "Thread-113" java.lang.NullPointerException
at org.openscience.cdk1.qsar.AtomValenceTool.getValence(AtomValenceTool.java:95)
at libpadeldescriptor.ExtendedTopochemicalAtomDescriptor.calculate(Unknown Source)
at libpadeldescriptor.CDK_Descriptor.run(Unknown Source)
Exception in thread "Thread-112" java.lang.ClassCastException: org.openscience.cdk1.qsar.result.DoubleArrayResult cannot be cast to org.openscience.cdk1.qsar.result.DoubleResult
at libpadeldescriptor.EStateAtomTypeDescriptor.calculate(Unknown Source)
at libpadeldescriptor.CDK_Descriptor.run(Unknown Source)
Exception in thread "Thread-107" java.lang.NullPointerException
at org.openscience.cdk1.qsar.AtomValenceTool.getValence(AtomValenceTool.java:95)
at libpadeldescriptor.PaDELChiIndexUtils.getValenceElectronCount(Unknown Source)
at libpadeldescriptor.PaDELChiIndexUtils.evalValenceIndex(Unknown Source)
at libpadeldescriptor.PaDELChiPathDescriptor.calculate(Unknown Source)
at libpadeldescriptor.CDK_Descriptor.run(Unknown Source)
Exception in thread "Thread-104" java.lang.NullPointerException
at org.openscience.cdk1.qsar.AtomValenceTool.getValence(AtomValenceTool.java:95)
at org.openscience.cdk1.qsar.descriptors.molecular.ChiIndexUtils.getValenceElectronCount(ChiIndexUtils.java:182)
at org.openscience.cdk1.qsar.descriptors.molecular.ChiIndexUtils.evalValenceIndex(ChiIndexUtils.java:169)
at org.openscience.cdk1.qsar.descriptors.molecular.ChiChainDescriptor.calculate(ChiChainDescriptor.java:198)
at libpadeldescriptor.CDK_Descriptor.run(Unknown Source)
Descriptor calculation completed in 0.240 secs . Average speed: 0.24 s/mol.
PaDEL descriptors calculated for: 1 molecules.
It looks like the first crash might only happen when number of inputs is < than the number of workers.
Thank you Rahul,
The parallel version is only recommended for 5000 chemicals or more at a time, as mentioned on the releases page. But of course, it'll work within your computational resources. So if you run 50,000 with limited RAM it'll crash.
The second example does not seem like a crash to me. If you ran one molecule, it seems that the calculation is finished. Some of the descriptors throw exceptions sometimes. If you don't like to see the full output you should use verbose mode "1" minimum or "0" silent. I hope this helps.