shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

some interesting warnings but looks like working?

AntonioBaeza opened this issue · comments

I tried the example you provided
It looks like working but has some interesting warnings:

(RESPECT) [ant@hillary RESPECT]$ respect -d data/ -m data/name_mapping.txt -I data/hist_info.txt -N 10 --debug
2021-03-19 00:32:55,108 WARNING:data/name_mapping.txt does not have valid extension; it's skipped
2021-03-19 00:32:55,109 WARNING:data/hist_info.txt does not have valid extension; it's skipped
2021-03-19 00:32:55,116 INFO:Processing mp_fq...
2021-03-19 00:32:55,502 INFO:compute_kmer_histogram finished in 0.17826151847839355 seconds
2021-03-19 00:32:55,502 ERROR:Error occurred when processing /home/ant/anaconda3/envs/RESPECT/RESPECT/data/Micromonas_pusilla_cov_0.5_err_0.01.fq.gz; it's skipped
Traceback (most recent call last):
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/respect_functions.py", line 246, in run_respect
parameter_estimator.set_kmer_histogram(args.threads, args.decomp)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 216, in set_kmer_histogram
self.compute_kmer_histogram(n_threads, decomp_util)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer
return func(*args, **kwargs)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 173, in compute_kmer_histogram
profiler_output = kmer_profiler(self.input_file, self.sequence_type, self.output_name, self.tmp_dir,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/profiling.py", line 112, in kmer_profiler
call(["jellyfish", "count", "-m", str(kmer_length), "-s", "100M", "-t", str(n_threads), "-C", "-o", mercnt,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 349, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'jellyfish'
2021-03-19 00:32:55,506 INFO:Processing mp_hq...
2021-03-19 00:32:55,508 INFO:Processing mp_ha...
2021-03-19 00:32:55,510 INFO:Processing mp_fa...
2021-03-19 00:32:55,754 INFO:compute_kmer_histogram finished in 0.008790969848632812 seconds
2021-03-19 00:32:55,754 ERROR:Error occurred when processing /home/ant/anaconda3/envs/RESPECT/RESPECT/data/GCF_000151265.2_Micromonas_pusilla_CCMP1545_v2.0_genomic.fna; it's skipped
Traceback (most recent call last):
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/respect_functions.py", line 246, in run_respect
parameter_estimator.set_kmer_histogram(args.threads, args.decomp)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 216, in set_kmer_histogram
self.compute_kmer_histogram(n_threads, decomp_util)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/timer.py", line 68, in wrapper_timer
return func(*args, **kwargs)
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/paramter_estimator.py", line 173, in compute_kmer_histogram
profiler_output = kmer_profiler(self.input_file, self.sequence_type, self.output_name, self.tmp_dir,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/site-packages/respect-1.2.0-py3.9.egg/respect/profiling.py", line 109, in kmer_profiler
call(["jellyfish", "count", "-m", str(kmer_length), "-s", "100M", "-t", str(n_threads), "-o", mercnt,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 349, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/ant/anaconda3/envs/RESPECT/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'jellyfish'
2021-03-19 00:32:56,231 INFO:Starting iterations to estimate parameters of mp_hq
Restricted license - for non-production use only - expires 2022-01-13
2021-03-19 00:32:56,670 INFO:Restricted license - for non-production use only - expires 2022-01-13
2021-03-19 00:33:04,609 INFO:estimate_genome_skim_parameters finished in 8.831506490707397 seconds
2021-03-19 00:33:04,642 INFO:Writing the results to the output files...
(RESPECT) [ant@hillary RESPECT]$ ll
total 56
drwxrwxr-x. 4 ant ant 4096 Mar 19 00:20 build
drwxrwxr-x. 2 ant ant 4096 Mar 19 00:32 data
drwxrwxr-x. 2 ant ant 4096 Mar 19 00:20 dist
-rw-rw-r--. 1 ant ant 210 Mar 19 00:33 estimated-parameters.txt
-rw-rw-r--. 1 ant ant 102 Mar 19 00:33 estimated-spectra.txt
-rw-rw-r--. 1 ant ant 1462 Mar 19 00:19 LICENSE
-rw-rw-r--. 1 ant ant 42 Mar 19 00:19 MANIFEST.in
-rw-rw-r--. 1 ant ant 8671 Mar 19 00:19 README.md
drwxrwxr-x. 4 ant ant 4096 Mar 19 00:20 respect
drwxrwxr-x. 2 ant ant 4096 Mar 19 00:20 respect.egg-info
-rw-rw-r--. 1 ant ant 1539 Mar 19 00:19 setup.py
drwxrwxr-x. 6 ant ant 4096 Mar 19 00:32 tmp
(RESPECT) [ant@hillary RESPECT]$

Looks like it did work
it created two new txt files
check if they have info:

FILE: estimated-parameters.txt

sample input_type sequence_type coverage genome_length uniqueness_ratio HCRM sequencing_error_rate
mp_hq histogram genome-skim 0.58 18823131 1.00 70.52 0.0126
mp_ha histogram assembly NA 21690409 0.94 73.60 NA

FILE: estimated-spectra.txt

sample r1 r2 r3 r4 r5
mp_hq 18964990 201428 23901 10067 31786
mp_ha 20427099 216746 54799 19206 32470

(RESPECT) [ant@hillary RESPECT]$ respect -d data/ -m data/name_mapping.txt -I data/hist_info.txt -N 10 --debug

Looks like British code.

Hi Antonio, Sorry I was very busy and missed this issue. It seems that only histogram inputs are processed and sequence input files are skipped from the output. The reason seems to be that jellyfish is not properly installed. You need to install jellyfish first and add its path to the system path (so you can run, e.g., jellyfish --version in the terminal without any problem). Please let me know if you encountered any problem in doing that or get other errors.

I have also found some time and worked on conda version of it, if this didn't work for you, soon you can easily install it that way. I will let you know once it is uploaded to bioconda.

I get the same warnings as @AntonioBaeza
2021-05-25 16:29:08,106 WARNING:data/hist_info.txt does not have valid extension; it's skipped
2021-05-25 16:29:08,106 WARNING:data/name_mapping.txt does not have valid extension; it's skipped

This is not an error. It just warns the user that some of the files under the input directory (provided using -d option) are not sequence files so will be skipped. You can safely ignore them.