shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

problems running RESPECT

svedwards opened this issue · comments

Hi - your program is exactly what I need but unfortunately it's very difficult to install. I echo the request to put it directly on conda, otherwise it won't be used very widely.

I've tried to get the gurobi license and I think I installed it correctly but I am still getting many errors. If you are at the Broad perhaps we can get together in the new year to trouble shoot (I'm at Harvard). My error message is below. Any files written are empty or only have headings with no genome size estimates. - Scott

(respect) [sedwards@holy7c24101 RESPECT]$ respect -i ../../histdata/moa_bwa_to_LR_emu_kmer_jelly.hist -I ../../moa_ref/Input_read_length.txt -N 10 --debug
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:118: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mapping_final = pandas.Series()
2022-12-23 10:31:57,963 INFO:Processing moa_bwa_to_LR_emu_kmer_jelly.hist...
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
2022-12-23 10:31:58,559 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist
Set parameter TokenServer to value "rclic1.rc.fas.harvard.edu"
2022-12-23 10:31:59,010 INFO:Set parameter TokenServer to value "rclic1.rc.fas.harvard.edu"
Failed to connect to token server 'rclic1.rc.fas.harvard.edu' (port 41954) - license file '/opt/gurobi/gurobi.lic'. Consult the Quick Start Guide for instructions on starting a token server.
2022-12-23 10:31:59,627 INFO:estimate_genome_skim_parameters finished in 1.5736157894134521 seconds
2022-12-23 10:31:59,627 ERROR:Error occurred when estimating parameters for /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/histdata/moa_bwa_to_LR_emu_kmer_jelly.hist; it's skipped
Traceback (most recent call last):
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 344, in call
return self.estimate_genomic_parameters(*args, **kwargs)
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 326, in estimate_genomic_parameters
self.estimate_genome_skim_parameters(spectra_number, error_norm, iterations_number, min_r1l, temperature)
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer
return func(*args, **kwargs)
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/paramter_estimator.py", line 281, in estimate_genome_skim_parameters
optimizer.run_simulated_annealing(iterations_number, min_r1l, temperature)
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 395, in run_simulated_annealing
repeat_spectra_next = self.estimate_repeat_spectra(o[1:], poisson_matrix_next[1:, :])
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 335, in estimate_repeat_spectra
spectral_residuals = [1.0 * constrained_spectra[i] / norm(constrained_spectra[i:], ord=1) for i in range(
File "/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py", line 335, in
spectral_residuals = [1.0 * constrained_spectra[i] / norm(constrained_spectra[i:], ord=1) for i in range(
TypeError: 'NoneType' object is not subscriptable
2022-12-23 10:31:59,628 ERROR:Error occurred while trying to get estimated parameters for a sample
2022-12-23 10:31:59,636 INFO:Writing the results to the output files...
(respect) [sedwards@holy7c24101 RESPECT]$

Sorry that you had trouble using it. We have tried to make a conda version, but the dependency on Gurobi has prevented that. New students have joined my former lab working on this project and hopefully they will be able to replace Gurobi with a free and open-source python library to make it available on Conda in the near future.

The error you get seems to be related to installing/using the Gurobi license on a server, which has proven to cause issues. I am at the Broad and would be happy to help you out in person in the new year. In the meantime, if you can tell us more about how you are running RESPECT on this server, we might be able to offer some workarounds. Specifically, where have you installed the license? Is it on the same node that you are running RESPECT? Do you run RESPECT directly, or the server uses job scheduling system to assign it to a compute node?

Hi Shahab -

Thanks for responding - particularly around the holidays! Yes, I have installed Gurobi on the same node as I am running respect. I am running it directly on the command line, not on submitted batch jobs that might use a different node. I see on our servers that we have an old Gurobi module (v. 9.5.2) available for public use, but when I use that I get an error saying that my licence can't use that version:

2022-12-26 12:38:32,360 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist
Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu"
2022-12-26 12:38:32,783 INFO:Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu"
Request denied: license not valid for Gurobi version 10

I am not familiar with this "rclic2.rc.fas.harvard.edu" port, but otherwise I think I have Gurobi installed in the correct folder.

Anyway, thanks very much for your help. Next week I should be able to get some help from the research computing staff here, but if you know of anything worth trying now, I would be very grateful.

Hi Scott,

After some digging in Gurobi support pages, here is my guess about what is causing the error based on the information you have provided: The server you are using already has an older version of Gurobi installed which uses a "floating" license to support a cluster system. However, using conda, you have installed the latest version of Gurobi which somehow cannot recognize the license you have obtained. I can think of two possible solutions that you can try:

  • If you want to rely on the server version of Gurobi, remove any installation of Gurobi via conda and let RESPECT use the server version. Run gurobi_cl in the command line to see whether the license for server version is recognized. If so, you should be able to run RESPECT without any problem. Otherwise, you need to contact your server admins to see why the server version is not working.
  • If you want to use your own installation of Gurobi, make sure the license that is being used is the one you obtained from Gurobi website, not the server license. For that, in the environment that you have installed Gurobi, set the environment variable GRB_LICENSE_FILE to the license file path, ex. by running export GRB_LICENSE_FILE=/usr/home/jones/gurobi.lic, and then run gurobi_cl. If this recognizes the license, it means you should be able to run RESPECT. You can add export GRB_LICENSE_FILE=path_to_license_file to your .bashrc or .bash_profile so that you don't have to do it each time manually.

Please try either of these solutions and let me know what happens.

Hi Shahab -

please see next comment, I think I got it working!*

Thanks for this help. It might make sense for us to move to email so I can share additional details about my set up. You can find my email on my Harvard web site (just google me). But I certainly don't expect you to spend more time on this - I am confident the RC folks here can help me once they open up again next week.

I first tried your option 2 and updated my .bashrc, based on your comments above and the Gurobi support:

export GUROBI_HOME="gurobi1000/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
export GRB_LICENSE_FILE="/n/home06/sedwards/opt/gurobi.lic"

However, when I run gurobi_cl I get:

[sedwards@boslogin03 ~]$ gurobi_cl
Set parameter Username
Set parameter LogFile to value "gurobi.log"

Failed to set up a license

Error 10009: HostID mismatch (licensed to 6f7f6c9f, hostid is 4ca23b2e)

I could share the info in my gurobi.lic file but that's probably best done on email. Indeed the licence ID in my .lic file is different from the host ID, as the error suggests.

For what it's worth, I also get a strange message when I run the bash script in the setup instructions:

[sedwards@boslogin03 ~]$ cd gurobi1000/linux64/bin
[sedwards@boslogin03 bin]$ ls
grbcluster grbgetkey grbprobe grb_ts grbtune gurobi_cl gurobi.sh python3.7
[sedwards@boslogin03 bin]$ ./gurobi.sh
./gurobi.sh: line 17: gurobi1000/linux64/bin/python3.7: No such file or directory

which is strange, since python3.7 is right there in the directory when I ls.

So, then I thought I would switch to the server version, which I loaded, then when I typed gurobi_cl it said:

[sedwards@holy7c24103 ~]$ gurobi_cl
Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu"
Set parameter LogFile to value "gurobi.log"
Using license file /n/sw/eb/apps/centos7/Gurobi/9.5.2/linux64/gurobi.lic

So I changed my license path in my .bashrc to that path above and then tried to set the TokenServer with:
gurobi_cl --server="rclic2.rc.fas.harvard.edu"

and got:

[sedwards@holy7c24103 ~]$ gurobi_cl --server="rclic2.rc.fas.harvard.edu"
Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu"
Set parameter ComputeServer to value "rclic2.rc.fas.harvard.edu"
Set parameter LogFile to value "gurobi.log"

Error 10022: Failed to connect to rclic2.rc.fas.harvard.edu port 80 after 6 ms: No route to host (code 7, command POST http://rclic2.rc.fas.harvard.edu/api/v1/cluster/jobs)

[sedwards@holy7c24103 ~]$ gurobi_cl -t

Checking status of Gurobi token server 'rclic2.rc.fas.harvard.edu'...

Token server functioning normally.
Maximum allowed uses: 4096, current: 0

So I feel I am close but still not connecting.

Anyway, don't trouble your self too much more on this, it's hard to diagnose from a distance.

I think I may have gotten it to work:

Two files were generated:

estimated-parameters_3.txt:
sample input_type sequence_type coverage genome_length uniqueness_ratio HCRM sequencing_error_rate average_read_length
moa_bwa_to_LR_emu_kmer_jelly.hist histogram genome-skim 6.66 15256279039 0.22 0.23 0.0163 101

estimated-spectra_3.txt:

sample r1 r2 r3 r4 r5
moa_bwa_to_LR_emu_kmer_jelly.hist 3282671707 2019841917 474941038 192376886 680166448

I had to generate a new gurobi licence and point my .bashrc to that licence.

There was a lot of text written to the screen but no errors as far as I can tell:

[sedwards@holy7c24103 ~]$ respect -i /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/histdata/moa_bwa_to_LR_emu_kmer_jelly.hist -I /n/holylfs04/LABS/edwards_lab/Lab/sedwards/moa/moa_ref/Input_read_length.txt -N 10 --debug
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:118: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mapping_final = pandas.Series()
2022-12-26 23:25:41,276 INFO:Processing moa_bwa_to_LR_emu_kmer_jelly.hist...
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
2022-12-26 23:25:41,901 INFO:Starting iterations to estimate parameters of moa_bwa_to_LR_emu_kmer_jelly.hist
Set parameter Username
2022-12-26 23:25:42,333 INFO:Set parameter Username
Academic license - for non-commercial use only - expires 2023-12-22
2022-12-26 23:25:42,336 INFO:Academic license - for non-commercial use only - expires 2023-12-22
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:234: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._parameters_dataframe = self._parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/optimizer.py:241: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self._spectra_dataframe = self._spectra_dataframe.append(pd.Series([iteration] + list(self.repeat_spectra),
2022-12-26 23:25:49,526 INFO:estimate_genome_skim_parameters finished in 8.120830297470093 seconds
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:36: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
parameters_dataframe = parameters_dataframe.append(
/n/home06/sedwards/.conda/envs/phyloacc/lib/python3.9/site-packages/respect-1.3.0-py3.9.egg/respect/respect_functions.py:50: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
spectra_dataframe = spectra_dataframe.append(pandas.Series([parameter_estimator.output_name] +
2022-12-26 23:25:49,556 INFO:Writing the results to the output files...

Glad that you got it to work! I hope we can simplify this for our users in the future. I have two comments to give this a closure:

  • I think the reason that you first got HostID mismatch error using the second approach (single-machine license) was that the host that you had initially activated the license for, has been different from the one that you later tested on. This can happen because in large clusters, there is often more than 1 head/login node and each time you login, a random node is allocated. That's probably why after installing a new license you got it to work. However, the chances are next time you login, you are assigned a different host and same error will pop up. A workaround for that is you write down the exact host that you installed the latest license on, and ssh to that specific host each time (the host name is what comes before $ in the new line of your terminal). Otherwise you need to obtain a new license for each new host each time.
  • Ultimately the server version is the preferred approach that spares you all these headaches and I strongly recommend it. You can still reach out to your IT staff and ask for further guidance, but I think the fist time you tried, it actually worked! When you said:

So, then I thought I would switch to the server version, which I loaded, then when I typed gurobi_cl it said:

[sedwards@holy7c24103 ~]$ gurobi_cl
Set parameter TokenServer to value "rclic2.rc.fas.harvard.edu"
Set parameter LogFile to value "gurobi.log"
Using license file /n/sw/eb/apps/centos7/Gurobi/9.5.2/linux64/gurobi.lic

This probably means it has worked, and you had to just go ahead and run RESPECT! I think for the sever version, you should not modify .bashrc and set any environment variables, they are probably set at the system-level and should not be changed by the user. If you want to try this again, I'd suggest just deactivate/remove your conda installation of Gurobi, and remove any related environment variables from your .bashrc, run gurobi_cl to see if you get the same output as above, and then just go ahead and run RESPECT!

PS. To suppress verbose output from RESPECT after you have made sure it runs correctly, you can omit --debug option.

Thanks again Shahab. Your suggestions are very helpful. I find that when I use the server version, I still get an error saying that my license is not valid for gurobi v. 10, but that is probably because, as you pointed out earlier, I have installed gurobi v. 10 somehow. At this point I may not rock the boat and just work with new licenses, but in the medium term (like next week!) I'll work with the IT staff here to sort things out.

Thanks again, your help has been much appreciated. Once the analyses are done, I'll let you know what I find out!

Scott

I see, that totally makes sense. For the server version, you are right, you just need to make sure you load the same Gurobi python package (gurobipy) that is available on your server (v9.5.2), and not the latest one (v10.0.0). They should be able to help you and easily fix that.

You are very welcome, it actually helped me to better understand how the license should be managed on a server, so thank you! Looking forward to hearing about your findings!