Bug sumstat_to_vcf

Question

Bug sumstat_to_vcf

m-mews opened this issue a year ago · comments

When I am trying to run the final sumstat_to_vcf_2 step, I receive the following error message. I checked the containers and htslib-1.16 is specified so I'm not sure why it isn't working. Thank you in advance for your help!

Bioinfo.def
cd /tmp && wget https://github.com/samtools/htslib/releases/download/1.16/htslib-1.16.tar.bz2 -O htslib-1.16.tar.bz2 &&
tar -xjvf htslib-1.16.tar.bz2 &&
cd htslib-1.16 &&
./configure --prefix=/usr/local/bin &&
make &&
make install &&
cp tabix bgzip htsfile /usr/local/bin && rm -rf /tmp/htslib*

Error >>>

bgzip: invalid option -- 'k'

Version: 1.12
Usage: bgzip [OPTIONS] [FILE] ...
Options:
-b, --offset INT decompress at virtual file pointer (0-based uncompressed offset)
-c, --stdout write on standard output, keep original files unchanged
-d, --decompress decompress
-f, --force overwrite files without asking
-h, --help give this help
-i, --index compress and create BGZF index
-I, --index-name FILE name of BGZF index file [file.gz.gzi]
-l, --compress-level INT Compression level to use when compressing; 0 to 9, or -1 for default [-1]
-r, --reindex (re)index compressed file
-g, --rebgzip use an index file to bgzip a file
-s, --size INT decompress INT bytes (uncompressed size)
-@, --threads INT number of compression threads to use [1]
-t, --test test integrity of compressed file

About: Merge multiple VCF/BCF files from non-overlapping sample sets to create one multi-sample file.
Note that only records from different files can be merged, never from the same file. For
"vertical" merge take a look at "bcftools norm" instead.
Usage: bcftools merge [options] <A.vcf.gz> <B.vcf.gz> [...]

Options:
--force-samples resolve duplicate sample names
--print-header print only the merged header and exit
--use-header use the provided header
-0 --missing-to-ref assume genotypes at missing sites are 0/0
-f, --apply-filters require at least one of the listed FILTER strings (e.g. "PASS,.")
-F, --filter-logic <x|+> remove filters if some input is PASS ("x"), or apply all filters ("+") [+]
-g, --gvcf <-|ref.fa> merge gVCF blocks, INFO/END tag is expected. Implies -i QS:sum,MinDP:min,I16:sum,IDV:max,IMF:max
-i, --info-rules tag:method,.. rules for merging INFO fields (method is one of sum,avg,min,max,join) or "-" to turn off the default [DP:sum,DP4:sum]
-l, --file-list read file names from the file
-L, --local-alleles EXPERIMENTAL: if more than ALT alleles are encountered, drop FMT/PL and output LAA+LPL instead; 0=unlimited [0]
-m, --merge allow multiallelic records for <snps|indels|both|all|none|id>, see man page for details [both]
--no-index merge unindexed files, the same chromosomal order is required and -r/-R are not allowed
--no-version do not append version and command line to the header
-o, --output write output to a file [standard output]
-O, --output-type <b|u|z|v> 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
-r, --regions restrict to comma-separated list of regions
-R, --regions-file restrict to regions listed in a file
--threads use multithreading with worker threads [0]

hsun3163 · Answer 1 · Tue Mar 28 2023 06:00:23 GMT+0800 (China Standard Time)

Hi @m-mews can you check if container = container is in your sumstat_to_vcf_2 section of the notebook as shown below ?

[sumstat_to_vcf_2]
output: f'{cwd}/{_input[0]:bn}.merged.vcf.gz'.replace(name[0],"_".join(name))
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand = '${ }', stderr = f'{cwd:a}/{_output:bn}.stderr', stdout = f'{cwd:a}/{_output:bn}.stdout',container = container

grennfp · Answer 2 · Fri Apr 14 2023 03:41:37 GMT+0800 (China Standard Time)

I'm having the same issue with this part. I have container = container in the notebook as well. Is there another potential solution to this?

[sumstat_to_vcf_2]
output: f'{cwd}/{_input[0]:bn}.merged.vcf.gz'.replace(name[0],"_".join(name))
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand = '${ }', stderr = f'{cwd:a}/{_output:bn}.stderr', stdout = f'{cwd:a}/{_output:bn}.stdout',container = container
    for i in ${_input:r}; do
    bgzip -k -f $i 
    tabix -p vcf -f  $i.gz; done
    bcftools merge ${" ".join([f'{str(x)}.gz' for x in _input])} --force-samples -m id  -Oz -o ${_output:a}

hsun3163 · Answer 3 · Fri Apr 14 2023 03:43:57 GMT+0800 (China Standard Time)

I'm having the same issue with this part. I have container = container in the notebook as well. Is there another potential solution to this?

[sumstat_to_vcf_2]
output: f'{cwd}/{_input[0]:bn}.merged.vcf.gz'.replace(name[0],"_".join(name))
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand = '${ }', stderr = f'{cwd:a}/{_output:bn}.stderr', stdout = f'{cwd:a}/{_output:bn}.stdout',container = container
    for i in ${_input:r}; do
    bgzip -k -f $i 
    tabix -p vcf -f  $i.gz; done
    bcftools merge ${" ".join([f'{str(x)}.gz' for x in _input])} --force-samples -m id  -Oz -o ${_output:a}

hmmm can you post the command you used? Also can you see if bgzip -k is functional in the container by singularity shell bioinfo.sif ?

on the other hand, the newest version of sumstat_to_vcf_2 should be skipped in the case that it is unnecessary (most of the time). Can you pull the repo again and give it a try?

grennfp · Answer 4 · Fri Apr 28 2023 21:56:55 GMT+0800 (China Standard Time)

Yes it seems pulling the repo worked so now it skips that part as expected. Thanks!