FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyError - '%s not in index' % objarr[mask] (pandas)

eladc opened this issue · comments

Hello,
My colleague is getting this error:

  File "/apps/RH7U2/gnu/OptiType/1.3.1/OptiTypePipeline.py", line 415, in <module>
    r = result_4digit[["A1", "A2", "B1", "B2", "C1", "C2", "nof_reads", "obj"]]
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_
indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['nof_reads' 'obj'] not in index"

pandas 0.20.3

Please advise.
Thank you.

Can you paste the full output before the error?

Thank you for replying,
Here's the output in verbose mode:


 0:00:30.12 Mapping 42BL.end1.fq to GEN reference...

 0:04:35.01 Mapping 42BL.end2.fq to GEN reference...

 0:09:52.04 Generating binary hit matrix.
0:09:52.14 Loading OUTDIR2/2017_10_30_21_37_33/2017_10_30_21_37_33_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...5K...6K...7K...
 0:09:57.73 7800 reads loaded. Creating dataframe...
0:09:59.06 Dataframes created. Shape: 7800 x 11179, hits: 2900866 (2900866), sparsity: 1 in 30.06
0:09:59.71 Loading OUTDIR2/2017_10_30_21_37_33/2017_10_30_21_37_33_2.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...5K...6K...7K...8K...
 0:10:05.51 8017 reads loaded. Creating dataframe...
0:10:06.13 Dataframes created. Shape: 8017 x 11179, hits: 2999358 (2999358), sparsity: 1 in 29.88
0:10:07.51 Alignment pairing completed. 7191 paired, 1373 unpaired, 31 discordant

 0:10:12.77 temporary pruning of identical rows and columns

 0:10:13.46 Size of mtx with unique rows and columns: (1724, 1670)
0:10:13.46 determining minimal set of non-overshadowed alleles

 0:10:58.47 Keeping only the minimal number of required alleles (179,)

 0:10:58.48 Creating compact model...

starting ilp solver with 1 threads...

 0:10:59.52 Initializing OptiType model...

Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer Community Edition 12.7.0.0
  with Simplex, Mixed Integer & Barrier Optimizers
5725-A06 5725-A29 5724-Y48 5724-Y49 5724-Y54 5724-Y55 5655-Y21
Copyright IBM Corp. 1988, 2016.  All Rights Reserved.

Type 'help' for a list of available commands.
Type 'help' followed by a command name for more
information on commands.

CPLEX> Logfile 'cplex.log' closed.
Logfile '/tmp/tmpQRfaMD.cplex.log' open.
CPLEX> Problem '/tmp/tmps__1ZK.pyomo.lp' read.
Read time = 0.01 sec. (0.25 ticks)
CPLEX> Problem name         : /tmp/tmps__1ZK.pyomo.lp
Objective sense      : Maximize
Variables            :    1179  [Nneg: 500,  Box: 1,  Binary: 678]
Objective nonzeros   :    1132
Linear constraints   :    2010  [Less: 2003,  Greater: 6,  Equal: 1]
  Nonzeros           :   11708
  RHS nonzeros       :     513

Variables            : Min LB: 0.000000         Max UB: 6.000000      
Objective nonzeros   : Min   : 0.009000000      Max   : 399.0000      
Linear constraints   :
  Nonzeros           : Min   : 1.000000         Max   : 6.000000      
  RHS nonzeros       : Min   : 1.000000         Max   : 6.000000      
CPLEX> CPLEX Error  1016: Promotional version. Problem size limits exceeded.

Error termination, CPLEX Error  1016.
Solution time =    0.00 sec.
Deterministic time = 0.00 ticks  (0.00 ticks/sec)

CPLEX> CPLEX Error  1217: No solution exists.
No file written.
CPLEX> Optimal solution hasn't been obtained. This is a terminal problem.

 0:11:02.20 Result dataframe has been constructed...
Traceback (most recent call last):
  File "/apps/RH7U2/gnu/OptiType/1.3.1/OptiTypePipeline.py", line 415, in <module>
    r = result_4digit[["A1", "A2", "B1", "B2", "C1", "C2", "nof_reads", "obj"]]
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['nof_reads' 'obj'] not in index"

CPLEX> CPLEX Error 1016: Promotional version. Problem size limits exceeded.

This is the problem. The CPLEX version you are using has a restriction on the problem size and OptiType is exceeding that size. I suggest switching to CBC as solver (https://projects.coin-or.org/Cbc).