picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error of running picrust2_pipeline

Gahyeon-Baek opened this issue · comments

Hello,

I'm a beginner in using Picrust2. Following the installation instructions, I used the command "conda install -c conda-forge -c bioconda picrust2" to install Picrust2. Then, I downloaded the necessary data files for Picrust2 execution using the following command: "picrust2_pipeline.py -s".

After that, I navigated to the desired analysis folder and executed the command "picrust2_pipeline.py -s seqs.fna -i table.biom -o picrust2_out_pipeline" for analysis. However, I encountered the following error:

Error: Traceback (most recent call last):
File "/Users/ghb/anaconda3/bin/place_seqs.py", line 107, in
main()
File "/Users/ghb/anaconda3/bin/place_seqs.py", line 73, in main
check_files_exist([args.study_fasta, args.ref_msa, args.tree])
File "/Users/ghb/anaconda3/lib/python3.10/site-packages/picrust2/util.py", line 328, in check_files_exist
raise ValueError("These input files were not found: " +
ValueError: These input files were not found: /Users/ghb/anaconda3/lib/python3.10/site-packages/default_files/prokaryotic/reference.fna, /Users/ghb/anaconda3/lib/python3.10/site-packages/default_files/prokaryotic/reference.tre

To resolve this issue, I navigated to the location "Users/ghb/anaconda3/lib/python3.10/site-packages/default_files/prokaryotic/" and found a file named "pro_ref". Upon examining the file, I noticed that it contained the files "pro_ref.fna," "pro_ref.model," "pro_ref.hmm," "pro_ref.raxml_info," and "pro_ref.tre." Although the file names were different, I presumed they were the required files and renamed them as "reference.fna," "reference.tre," and "reference.hmm" as mentioned earlier. Subsequently, I retried the pipeline.

I would like to confirm if I have correctly addressed the problem.

I will wait for your response.

Best regards,
Gahyeon Baek

I ran the pipeline using the command above, and as a result, the following error occurred.

/Users/ghb/anaconda3/lib/python3.10/site-packages/picrust2/metagenome_pipeline.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
from pandas.util.testing import assert_frame_equal
INFO Converting given FASTA file to BFAST format...
INFO Resulting bfast file was written to: picrust2_out_pipeline/intermediate/place_seqs/epa_out/study_seqs_hmmalign.fasta.bfast
INFO Selected: Output dir: picrust2_out_pipeline/intermediate/place_seqs/epa_out/
INFO Selected: Query file: picrust2_out_pipeline/intermediate/place_seqs/epa_out/study_seqs_hmmalign.fasta.bfast
INFO Selected: Tree file: /Users/ghb/anaconda3/lib/python3.10/site-packages/default_files/prokaryotic/reference.tre
INFO Selected: Reference MSA: picrust2_out_pipeline/intermediate/place_seqs/ref_seqs_hmmalign.fasta
INFO Selected: Automatic switching of use of per rate scalers
INFO Selected: Preserving the root of the input tree
INFO Selected: Specified model: GTR+G
INFO Selected: Reading queries in chunks of: 5000
INFO Selected: Using threads: 1
INFO ______ ____ ___ _ __ ______
/ // __ \ / | / | / // /
/ __/ / /
/ // /| | ______ / |/ // / __
/ /
/ // ___ |/_____// /| // // /
/_____//
/ /
/ |
| /
/ |
/ _
_/ (v0.3.8)
INFO Using model parameters:
INFO Rate heterogeneity: GAMMA (4 cats, mean), alpha: 1 (ML), weights&rates: (0.25,0.136954) (0.25,0.476752) (0.25,1) (0.25,2.38629)
Base frequencies (ML): 0.25 0.25 0.25 0.25
Substitution rates (ML): 0.5 0.5 0.5 0.5 0.5 1
INFO Output file: picrust2_out_pipeline/intermediate/place_seqs/epa_out/epa_result.jplace
INFO 5000 Sequences done!
INFO 10000 Sequences done!
INFO 15000 Sequences done!
INFO 20000 Sequences done!
INFO 25000 Sequences done!
INFO 30000 Sequences done!
INFO 35000 Sequences done!
INFO 40000 Sequences done!
INFO 42397 Sequences done!
INFO Time spent placing: 1909s
INFO Elapsed Time: 1912s
The following arguments were not expected: picrust2_out_pipeline/intermediate/place_seqs/epa_out --out-dir --fully-resolve picrust2_out_pipeline/intermediate/place_seqs/epa_out/epa_result.jplace --jplace-path
Run with --help for more information.
Error running this command:
gappa analyze graft --jplace-path picrust2_out_pipeline/intermediate/place_seqs/epa_out/epa_result.jplace --fully-resolve --out-dir picrust2_out_pipeline/intermediate/place_seqs/epa_outㅋ

Please give me a solution for the error.

Thank you.

It looks like you installed PICRUSt2 in your base conda environment - is that correct? If so, I would re-install in a new separate environment. I'm not sure why the initial error was stating that it was looking for files like 'reference.fna' - you can see the names of the expected files here: https://github.com/picrust/picrust2/blob/master/picrust2/default.py. Perhaps there is a conflict with an older version of PICRUSt2 in your base environment? Also 'picrust2_pipeline.py -s' does not download the necessary files - did you read that somewhere?

Last, the second error you ran into is very odd - I'm hoping that it will also be resolved if you do a clean install in a new environment. You can also update to the latest version of conda (and mamba if using that) beforehand too.

Cheers,

Gavin

Hello Gavin,

First of all, I tried to create a new environment with PICRUST2 installed and activate the environment using this command: conda create -n picrust2 -c bioconda -c conda-forge picrust2=2.5.2
But, It couldn't be installed during 3 days.
So I tried another way to install PICRUST2.
I downloaded the source tarball, untar, and moved into the directory as you wrote in the link: https://github.com/picrust/picrust2/wiki/Installation

However, during the stage of Metagenome prediction, I encountered a specific error like this:
(picrust2) ghb@SSMLui-iMac picrust2_out_pipeline % metagenome_pipeline.py -i ../table.biom -m marker_predicted_and_nsti.tsv.gz -f EC_predicted.tsv.gz
-o EC_metagenome_out --strat_out

Traceback (most recent call last):
File "/Users/ghb/anaconda3/envs/picrust2/bin/metagenome_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/Users/ghb/picrust2-2.5.2/scripts/metagenome_pipeline.py", line 121, in
main()
File "/Users/ghb/picrust2-2.5.2/scripts/metagenome_pipeline.py", line 93, in main
strat_pred, unstrat_pred = run_metagenome_pipeline(
File "/Users/ghb/picrust2-2.5.2/picrust2/metagenome_pipeline.py", line 37, in run_metagenome_pipeline
study_seq_counts = read_seqabun(input_seqabun)
File "/Users/ghb/picrust2-2.5.2/picrust2/util.py", line 331, in read_seqabun
input_seqabun = biom.load_table(infile).to_dataframe(dense=True)
File "/Users/ghb/anaconda3/envs/picrust2/lib/python3.8/site-packages/biom/parse.py", line 668, in load_table
table = parse_biom_table(fp)
File "/Users/ghb/anaconda3/envs/picrust2/lib/python3.8/site-packages/biom/parse.py", line 405, in parse_biom_table
c = file_obj.read(1)
File "/Users/ghb/anaconda3/envs/picrust2/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte

Do you know why this error occurs?

Thank you.

Hi there @Gahyeon-Baek,

I'm not sure if you've fixed this error in the meantime, but that kind of error usually suggests an issue with the input files being used not being in the right format. Could you add the files that you're trying to run PICRUSt2 with please?

Thanks,
Robyn