Clinical-Genomics / cg

Glue between Clinical Genomics apps

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

“non-hiseq-clinical”-users cannot use the decompress commmand.

Lucpen opened this issue · comments

Description

The decompress command cg decompress does not work for non-hiseq-clinical users. The issue is likely caused because the
#SBATCH --account=production. When “non-hiseq-clinical”-users run cg decompress a script is created to decompress the FASTQ for the first lane of the file but not the subsequent lanes, making the possibility of changing account to development in the script manually useless as only the first lane would be decompressed.

Suggested solution

Possible solutions:

  1. Change #SBATCH --account=production to development
  2. Make the command cg decompress write all the scripts so that “non-hiseq-clinical”-users can change account and run them independently

This can be closed when

“non-hiseq-clinical”-users can decompress files

Blocked by

If there are any blocking issues/prs/things in this or other repos. Please link to them.

You can switch accounts with -a:

cg downsample samples --help
Usage: cg downsample samples [OPTIONS]

  Downsample reads in one or multiple samples in a case. Usage:  For a single
  sample: cg downsample samples -c supersonicturtle -cn new_case_name -i
  ACC1234 0.1 For multiple samples:cg downsample samples -c supersonicturtle
  -cn new_case_name -i ACC1234 0.1 -i ACC12324 10

Options:
  -c, --case-id TEXT        Case identifier used in statusdb, e.g.
                            supersonicturtle. The case information wil be
                            transferred to the downsampled case.  [required]
  -cn, --case-name TEXT     Case name that is used as name for the downsampled
                            case.  [required]
  -a, --account TEXT        Please specify the account to use for the
                            downsampling. Defaults to production (production)
                            or development (stage) account if not specified.
  -i, --input-data TEXT...  Identifier used in statusdb, e.g. ACC1234567 and
                            the number of reads to down sample to in millions
                            separated by a space e.g. ACC1234567 30.0.
                            Multiple inputs can be provided.  [required]
  --dry-run                 Runs the command without making any changes

Are the other lanes not reached because you get a SLURM error using the wrong account and the process exits?

@henrikstranneheim I think the cg decompress (note, not downsample) does not have the account option.

Added to refinement 19-06-2024

Yes, exactly, it exits, it only reaches the first lane.

My bad, I read downsample for some reason. 👍

I had a similar issue with downsample, so its good to know how to solve that one anyway :)

Technical refinement

Q for IO: Can we copy the spring files from prod to stage, and then decompress them in the stage environment with the cg command

If so we suggest that solution

This has been tested. Sample was copied to stage and the command was run successfully.

[isak.ohlsson@hasta:cg] [S_base] master ± cg decompress sample <sample_id>
Called undefined __fields__ on HousekeeperAPI, please wrap
Running decompress spring
Updating compress api
Set dry run to False
Found file <sample_id>/2024-03-24/****.spring
Check if pending compression file exists
/home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.crunchy.pending.txt does not exist
Check if SPRING archive file exists
Check if FASTQ pair exists
/home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.fastq.gz does not exist
Decompression is possible
Decompressing /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.spring to FASTQ format for sample <sample_id>
Creating pending flag /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.crunchy.pending.txt
Fetch SPRING metadata from /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.json
Created temporary dir /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/spring_56bbnl0v_decompress
Submit sbatch /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****_decompress_spring.sh
Running command sbatch /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****_decompress_spring.sh
Submitted batch job 6831422
Spring decompression running as job 6831422
Trying to parse date string None
Fetch SPRING metadata from /home/proj/stage/housekeeper-bundles/<sample_id>/2024-03-24/****.json
Adding today date to SPRING metadata file
Decompressed sample <sample_id>
[isak.ohlsson@hasta:cg] [S_base] master ± 

Fastq pair was added to the stage housekeeper bundle

Our suggested solution is to copy the housekeeper bundle to stage and do the decompression in stage. If this causes issues, please reopen.