FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when running with GLPK solver

opened this issue · comments

Hi,

When running against the test data, I'm getting the below error that I think traces back to GLPK but I'm not sure. Can someone help me possibly debug this issue?

Here's the shelll script I used to run it:

`

!/bin/bash

export SAMTOOLS=/Biomarker/ngs/software/samtools/samtools-1.2/bin
export GLPK=/Biomarker/ngs/software/glpk/glpk-4.59/bin
export PATH=$SAMTOOLS:$GLPK:$PATH
export HDF5_DIR=/Biomarker/ngs/software/HD5/hdf5-1.8.16-linux-centos7-x86_64-gcc483-shared
export LD_LIBRARY_PATH=/Biomarker/ngs/software/HD5/hdf5-1.8.16-linux-centos7-x86_64-gcc483-shared/lib

/Biomarker/ngs/software/bin/python OptiType-master/OptiTypePipeline.py -i OptiType-master/test/exome/NA11995_SRR766010_1_fished.fastq OptiType-master/test/exome/NA11995_SRR766010_2_fished.fastq --dna --verbose --config OptiType-master/config.ini -o OptiType-master/test/exome/
`

The head of the .raw file looks like this:

c Problem:
c Rows: 450
c Columns: 282
c Non-zeros: 1715
c Status: INTEGER OPTIMAL
c Objective: x282 = 1135.192 (MAXimum)
c
s mip 450 282 o 1135.192
i 1 1
i 2 2
i 3 2
i 4 1
i 5 1
i 6 1
i 7 1
i 8 2
i 9 2
i 10 1

ERROR (at the bottom):

0:00:01.08 Mapping NA11995_SRR766010_1_fished.fastq to GEN reference...

0:00:31.21 Mapping NA11995_SRR766010_2_fished.fastq to GEN reference...

0:00:57.64 Generating binary hit matrix.
0:00:57.66 Loading OptiType-master/test/exome/2016_03_23_16_57_45/2016_03_23_16_57_45_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...
0:01:00.97 1909 reads loaded. Creating dataframe...
0:01:01.22 Dataframes created. Shape: 1909 x 11179, hits: 688669 (1249465), sparsity: 1 in 17.08
0:01:01.60 Loading OptiType-master/test/exome/2016_03_23_16_57_45/2016_03_23_16_57_45_2.bam started. Number of HLA reads loaded (updated every thousand):
1K...
0:01:04.73 1876 reads loaded. Creating dataframe...
0:01:04.92 Dataframes created. Shape: 1876 x 11179, hits: 657359 (1192811), sparsity: 1 in 17.58
0:01:05.67 Alignment pairing completed. 1681 paired, 359 unpaired, 32 discordant

0:01:11.14 temporary pruning of identical rows and columns

0:01:11.32 Size of mtx with unique rows and columns: (496, 776)
0:01:11.32 determining minimal set of non-overshadowed alleles

0:01:13.67 Keeping only the minimal number of required alleles (62,)

0:01:13.67 Creating compact model...

0:01:13.82 Initializing OptiType model...
GLPSOL: GLPK LP/MIP Solver, v4.59
Parameter(s) specified in the command line:
--write /tmp/tmpGZXIuT.glpk.raw --wglp /tmp/tmpmXCoNz.glpk.glp --cpxlp /tmp/tmpWPTOBn.pyomo.lp
Reading problem data from '/tmp/tmpWPTOBn.pyomo.lp'...
/tmp/tmpWPTOBn.pyomo.lp:3620: warning: lower bound of variable 'x1' redefined
/tmp/tmpWPTOBn.pyomo.lp:3620: warning: upper bound of variable 'x1' redefined
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
3791 lines were read
Writing problem data to '/tmp/tmpmXCoNz.glpk.glp'...
3276 lines were written
GLPK Integer Optimizer, v4.59
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
Preprocessing...
2 hidden packing inequaliti(es) were detected
95 hidden covering inequaliti(es) were detected
444 rows, 280 columns, 1705 non-zeros
170 integer variables, all of which are binary
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 6.000e+00 ratio = 6.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 444
Solving LP relaxation...
GLPK Simplex Optimizer, v4.59
444 rows, 280 columns, 1705 non-zeros
0: obj = -0.000000000e+00 inf = 5.000e+00 (5)
5: obj = -3.000000000e-02 inf = 0.000e+00 (0)

  • 241: obj = 1.135192000e+03 inf = 3.064e-14 (0)
    OPTIMAL LP SOLUTION FOUND
    Integer optimization begins...
  • 241: mip = not found yet <= +inf (1; 0)
  • 241: >>>>> 1.135192000e+03 <= 1.135192000e+03 0.0% (1; 0)
  • 241: mip = 1.135192000e+03 <= tree is empty 0.0% (0; 1)
    INTEGER OPTIMAL SOLUTION FOUND
    Time used: 0.0 secs
    Memory used: 0.7 Mb (722870 bytes)
    Writing MIP solution to '/tmp/tmpGZXIuT.glpk.raw'...
    741 lines were written
    invalid literal for int() with base 10: 'c'
    WARNING: Solver does not support multi-threading. Please change the config file accordingly. Falling back to single-threading.
    GLPSOL: GLPK LP/MIP Solver, v4.59
    Parameter(s) specified in the command line:
    --write /tmp/tmpz_UceC.glpk.raw --wglp /tmp/tmpW8xrDS.glpk.glp --cpxlp /tmp/tmphE7GB3.pyomo.lp
    Reading problem data from '/tmp/tmphE7GB3.pyomo.lp'...
    /tmp/tmphE7GB3.pyomo.lp:3620: warning: lower bound of variable 'x1' redefined
    /tmp/tmphE7GB3.pyomo.lp:3620: warning: upper bound of variable 'x1' redefined
    450 rows, 282 columns, 1715 non-zeros
    171 integer variables, all of which are binary
    3791 lines were read
    Writing problem data to '/tmp/tmpW8xrDS.glpk.glp'...
    3276 lines were written
    GLPK Integer Optimizer, v4.59
    450 rows, 282 columns, 1715 non-zeros
    171 integer variables, all of which are binary
    Preprocessing...
    2 hidden packing inequaliti(es) were detected
    95 hidden covering inequaliti(es) were detected
    444 rows, 280 columns, 1705 non-zeros
    170 integer variables, all of which are binary
    Scaling...
    A: min|aij| = 1.000e+00 max|aij| = 6.000e+00 ratio = 6.000e+00
    Problem data seem to be well scaled
    Constructing initial basis...
    Size of triangular part is 444
    Solving LP relaxation...
    GLPK Simplex Optimizer, v4.59
    444 rows, 280 columns, 1705 non-zeros
    0: obj = -0.000000000e+00 inf = 5.000e+00 (5)
    5: obj = -3.000000000e-02 inf = 0.000e+00 (0)
  • 241: obj = 1.135192000e+03 inf = 3.064e-14 (0)
    OPTIMAL LP SOLUTION FOUND
    Integer optimization begins...
  • 241: mip = not found yet <= +inf (1; 0)
  • 241: >>>>> 1.135192000e+03 <= 1.135192000e+03 0.0% (1; 0)
  • 241: mip = 1.135192000e+03 <= tree is empty 0.0% (0; 1)
    INTEGER OPTIMAL SOLUTION FOUND
    Time used: 0.0 secs
    Memory used: 0.7 Mb (722870 bytes)
    Writing MIP solution to '/tmp/tmpz_UceC.glpk.raw'...
    741 lines were written
    invalid literal for int() with base 10: 'c'
    Traceback (most recent call last):
    File "OptiType-master/OptiTypePipeline.py", line 374, in
    result = op.solve(args.enumerate)
    File "/Biomarker/ngs/software/OptiType/OptiType-master/model.py", line 150, in solve
    res = self.__solver.solve(self.__instance, options={}, tee=self.__verbosity)
    File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/base/solvers.py", line 578, in solve
    result = self._postsolve()
    File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 161, in _postsolve
    results = self.process_output(self._rc)
    File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 220, in process_output
    self.process_soln_file(results)
    File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/solvers/plugins/solvers/GLPK.py", line 445, in process_soln_file
    raise ValueError(msg)
    ValueError: Error parsing solution data file, line 1

Hi,

Have you looked at this post #28. It seems that newer versions of GLPK cause some problems with Pyomo. You also might try CBC as solver (https://projects.coin-or.org/Cbc). CBC is also free and open-source, but much much faster than GLPK.

Ugh. I forgot to search closed issues. Thanks. I'll downgrade for now and install CBC later once I've completed testing. Sorry for not seeing that post earlier.