jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trimming fails

parkerac opened this issue · comments

I've been running atropos on RNA-seq data, and it has worked for most of the samples, but failed for about 1/10th of them (the command and output are below). I can't seem to find any documentation about this error. Would you be able to provide some insight about what I should do from here?

atropos -a file:adapters.fasta -q 10 -o ${OUTNAME}_output.fastq -se ${FILENAME}

2018-07-20 08:11:43,963 INFO: This is Atropos 1.1.18 with Python 3.6.5
2018-07-20 08:11:44,019 ERROR: Error executing command trim
Traceback (most recent call last):
File "/homedir/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run
self.return_code = self()
File "/homedir/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call
adapter_cache = super().load_known_adapters()
File "/homedir/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters
adapter_cache = AdapterCache(cache_file)
File "/homedir/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init
self.seq_to_name, self.name_to_seq = pickle.load(cache)
_pickle.UnpicklingError: pickle data was truncated

Thanks for reporting this. Have you determined if this error is deterministic (i.e. always happens for the same samples)? If so, could you please provide a minimal dataset that reproduces the error?

Previously, I was running the files through atropos individually, but I realized that I should be using the paired-end mode. However, I am still getting the same errors. One error is "_pickle.UnpicklingError: pickle data was truncated," and the other error is "EOFError: Ran out of input."

This pair of files created the unpickling error:
https://byu.box.com/s/l4msvlptpenngkb3h6k2m7ednr4td64o
https://byu.box.com/s/l7iovhax652wvv7jbqlc6nkcbk4xs67a

This pair of files created the EOF error:
https://byu.box.com/s/6ibjhuzt361vtcxjdr601bx86h528sra
https://byu.box.com/s/08ldgnfigzwbrqccgbp4ppqpn2v2lyis

This was the command I used:
atropos -T 4 -a file:adapters.fasta -q 10 -o ${OUTNAME}_output.fastq -p ${OUTNAME}_output.fastq -pe1 ${FILENAME} -pe2 ${FILENAME2}

Thank you for looking into this!

Hi @parkerac, apologies but I have not had a chance to look at this until now, and it seems those files are no longer available. Could you please share them again? Thanks!

@parkerac you can also try the new 2.x release of Atropos. I believe I've fixed the issue.

@jdidion - can you please provide details on how to install this new version? When trying pip I'm only able to install 1.1.24. I'm receiving a similar error with this version:

2020-01-10 09:39:58,427 INFO: This is Atropos 1.1.24 with Python 3.6.7
2020-01-10 09:39:58,433 ERROR: Error executing command trim
Traceback (most recent call last):
File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run
self.return_code = self()
File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call
adapter_cache = super().load_known_adapters()
File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters
adapter_cache = AdapterCache(cache_file)
File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init
self.seq_to_name, self.name_to_seq = pickle.load(cache)
EOFError: Ran out of input

@justinshaffer right now 2.x is in pre-release, so you have to use the --pre option with pip.

Thanks @jdidion!

I ran into the following error when trying your suggestion:

$ pip install --pre atropos
Collecting atropos
Using cached https://files.pythonhosted.org/packages/82/a2/9f1cd425174848cd85a9fbf58b5f35d98e0db0f8868c6516c567d8befc6e/atropos-2.0.0a2.tar.gz
ERROR: Command errored out with exit status 1:
command: /home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-87rfjh42/atropos/setup.py'"'"'; file='"'"'/tmp/pip-install-87rfjh42/atropos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-87rfjh42/atropos/pip-egg-info
cwd: /tmp/pip-install-87rfjh42/atropos/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-87rfjh42/atropos/setup.py", line 122, in
Path(file).parent.absolute() / "README.md", encoding="utf-8"
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-87rfjh42/atropos/README.md'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Sorry about that. It looks like I missed making some updates to the MANIFEST. It's fixed now and I've pushed a new build to pypi (2.0.0-alpha.3).

Thanks! I was able to install but am now running into errors related to my script - I see that the --nextseq-trim parameter is not supported so I removed it - here is what I'm running now:

atropos
-a GGGGGGGGGG
-A GGGGGGGGGG
-pe1 /projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2_sediment_Pi31_S415_L002_R1_001_atropos_adapters.fastq.gz
-pe2 /projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2_sediment_Pi31_S415_L002_R2_001_atropos_adapters.fastq.gz
-o /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/Berry2_sediment_Pi31_S415_L002_R1_001_atropos_adapters_polyg.fastq.gz
-p /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/Berry2_sediment_Pi31_S415_L002_R2_001_atropos_adapters_polyg.fastq.gz
-e 0.1
-q 15
--insert-match-error-rate 0.2
--minimum-length 100
--pair-filter any
--report-file /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/00_atropos_logs/atropos_log_Berry2_sediment_Pi31_S415_L002.txt
--report-formats txt
-T 16

And here is the error I received:

/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/xphyle/paths.py:149: DeprecationWarning: Use of resolve_path with string path arguments is deprected (lineno /home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/argparse.py:2265)
f"Use of {func.name} with string path arguments is "
/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/xphyle/paths.py:149: DeprecationWarning: Use of resolve_path with string path arguments is deprected (lineno /home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/argparse.py:2265)
f"Use of {func.name} with string path arguments is "
2020-01-10 12:53:25.748 | INFO | atropos.commands.console:_setup_logging:247 - This is Atropos 2.0.0a3 with Python 3.6.7
%(asctime)s %(levelname)s: %(message)s
2020-01-10 12:53:25.791 | ERROR | atropos.console:execute_cli:151 - Error executing command: trim
Traceback (most recent call last):
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/atropos", line 8, in
sys.exit(main())
│ │ └ <function main at 0x7f8a78742ae8>
│ └
└ <module 'sys' (built-in)>
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/main.py", line 17, in main
sys.exit(run_atropos(args))
│ │ │ └ None
│ │ └ <function run_atropos at 0x7f8a787341e0>
│ └
└ <module 'sys' (built-in)>
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 56, in run_atropos
return execute_cli(args)
│ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
└ <function execute_cli at 0x7f8a6ffa9400>

File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 144, in execute_cli
retcode, summary = command.execute(args)
│ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
│ └ <classmethod object at 0x7f8a6ffa3898>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 142, in execute
options = cls._parse_args(args)
│ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
│ └ <classmethod object at 0x7f8a6ffa38d0>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 170, in _parse_args
cls._validate_options(options, parser)
│ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas...
│ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...
│ └ <classmethod object at 0x7f8a6fe7b278>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 111, in _validate_options
cls._validate_trim_options(options, parser)
│ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas...
│ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...
│ └ <staticmethod object at 0x7f8a6fe7b2e8>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 1219, in _validate_trim_options
options.can_use_system_compression = fmt.can_use_system_compression()
│ │ │ └ <property object at 0x7f8a7018e7c8>
│ │ └ <xphyle.formats.Gzip object at 0x7f8a7018ad68>
│ └ False
└ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...

TypeError: 'bool' object is not callable
%(asctime)s %(levelname)s: %(message)s
Traceback (most recent call last):
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/atropos", line 8, in
sys.exit(main())
│ │ └ <function main at 0x7f8a78742ae8>
│ └
└ <module 'sys' (built-in)>
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/main.py", line 17, in main
sys.exit(run_atropos(args))
│ │ │ └ None
│ │ └ <function run_atropos at 0x7f8a787341e0>
│ └
└ <module 'sys' (built-in)>
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 56, in run_atropos
return execute_cli(args)
│ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
└ <function execute_cli at 0x7f8a6ffa9400>

File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 144, in execute_cli
retcode, summary = command.execute(args)
│ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
│ └ <classmethod object at 0x7f8a6ffa3898>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 142, in execute
options = cls._parse_args(args)
│ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2...
│ └ <classmethod object at 0x7f8a6ffa38d0>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 170, in _parse_args
cls._validate_options(options, parser)
│ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas...
│ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...
│ └ <classmethod object at 0x7f8a6fe7b278>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 111, in _validate_options
cls._validate_trim_options(options, parser)
│ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas...
│ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...
│ └ <staticmethod object at 0x7f8a6fe7b2e8>
└ atropos.commands.trim.console.TrimCommandConsole
File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 1219, in _validate_trim_options
options.can_use_system_compression = fmt.can_use_system_compression()
│ │ │ └ <property object at 0x7f8a7018e7c8>
│ │ └ <xphyle.formats.Gzip object at 0x7f8a7018ad68>
│ └ False
└ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...

TypeError: 'bool' object is not callable

Any input regarding the error would be super helpful. Thanks in advance

It looks like you've discovered a bug - I'll work on debugging it.

Also, please check out the change list: https://github.com/jdidion/atropos/blob/develop/CHANGES.md

The --nextseq-trim option has changed to --twocolor-trim.

I fixed that issue and released a new build (2.0.0-alpha.4). Please try again.

If you run into more issues, it might be faster to give me a minimal dataset. That way I can run the same command that you're running and work through any issues without having to go back-and-forth each time.

Thanks @jdidion!

I want to take a step back and try to address my first error when using version 1.1.24, as I feel you may be able to address my problem.

I'm most puzzled by the error, because I was able to successfully process the same files using the script previously. I ran into space limitations on our server, which caused the jobs to fail. After obtaining more disk space, I attempted to re-run, which is when I ran into the error that I've copied again below.

I wonder if there are some temporary files, or things being written to a location other than the output location I specified in my script, that are causing the error? It seems I only get the error when attempting to process files that I already have previously, but I have not tested this yet.

Do you have any ideas or thoughts along these lines? Thanks in advance.

What I ran:

atropos
-a GGGGGGGGGG
-A GGGGGGGGGG
-pe1 /sequencing/ucsd/complete_runs/191119_A00953_0026_BHW77GDSXX/Extraction_Test_Nextera_XT_Flex/stool_human_A_1_standard_H_XT_S97_L001_R1_001.fastq.gz
-pe2 /sequencing/ucsd/complete_runs/191119_A00953_0026_BHW77GDSXX/Extraction_Test_Nextera_XT_Flex/stool_human_A_1_standard_H_XT_S97_L001_R2_001.fastq.gz
-o /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/stool_human_A_1_standard_H_XT_S97_L001_R1_001_atropos_polyg.fastq.gz
-p /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/stool_human_A_1_standard_H_XT_S97_L001_R2_001_atropos_polyg.fastq.gz
--nextseq-trim 1
-e 0.1
-q 15
--insert-match-error-rate 0.2
--minimum-length 100
--pair-filter any
--report-file /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/atropos_log_stool_human_A_1_standard_H_XT_S97_L001.txt
--report-formats txt
-T 16

The error:

2020-01-10 13:56:10,600 INFO: This is Atropos 1.1.24 with Python 3.6.8
2020-01-10 13:56:10,609 ERROR: Error executing command trim
Traceback (most recent call last):
File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run
self.return_code = self()
File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call
adapter_cache = super().load_known_adapters()
File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters
adapter_cache = AdapterCache(cache_file)
File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init
self.seq_to_name, self.name_to_seq = pickle.load(cache)
EOFError: Ran out of input

The error is due to trying to load a corrupted adapter cache file. I added code to handle this in the develop branch, which is why I suggested trying out the 2.0.0* build. But I've also just back-ported the fix to the 1.1.x branch and released a new version (1.1.25). Please try it out.

Thanks, @jdidion. I sincerely appreciate that.

Just curious - any idea how that file gets corrupted? Does it have to do with re-processing - or perhaps when jobs are killed intermediately?

Thanks in advance

There could be a couple ways - the job gets killed while writing the file, the file was written by a newer version of python than is used to read it (for e.g. you change python versions between running the application). I don't think it's a multi-threading issue, but I will review the code to make sure.

I am closing this issue. Please re-open if you still experience the problem using the new version.