Using phased SNPs with chisel_nonormal
jberg1999 opened this issue · comments
Hi Chisel developers!
I am running chisel_nonormal with phased SNPs from matching germline WES data. However, it looks like the chisel_nonormal pipeline does not seem to use these for BAF estimation and I am not sure if this is the desired behavior.
Line 198 of chisel_nonormal.py calls bafEstimator in the following way that does not include the simulated normal:
cmd = cmd.format(get_comp('BAFEstimator.py'), args['tumor'], args['reference'], args['jobs'], lcel, args['listphased'])
In BAFEstimator.py not having a normal sample triggers the following code block.
elif args.normal is None:
log('A matched-normal sample has not been provided and the presence of the provided heterozygous germline SNPs will not be correspondigly assessed.', level='WARN')
This also seems to cause the following set of errors in my case. I have attached a log file for the run, but I include the main parts below. Also, the resulting baf.tsv is empty.
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 326, in <module>
main()
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 104, in main
snps = selecting(args, phased)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 170, in selecting
pool = Pool(processes=min(args['J'], len(phased)), initializer=init_selecting, initargs=initargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 154, in __init__
raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
�[4m�[96m[2023-Mar-23 22:21:09]Combining RDRs and BAFs�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Parsing and checking arguments�[0m
�[92m[2023-Mar-23 22:21:23]Arguments:
maxerror : None
restarts : 100
bootstrap : 100
gccorr : /michorlab/jacobg/wes/ref_files/hg38/bwa_gdc/GRCh38.d1.vd1.CIDC.fa
minerror : 0.001
rdr : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/rdr/rdr.tsv
blocksize : 50000
j : 32
baf : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/baf/baf.tsv
phasecorr : True
seed : 12
listofcells : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/rdr/total.tsv
significance : 0.05
missingsnps : (10, 0)
minimumsnps : 0.08
alphagc : 0.05�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Read list of cells�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Reading RDR�[0m
�[1m�[95m[2023-Mar-23 22:21:27]Reading BAF�[0m
�[1m�[95m[2023-Mar-23 22:21:27]Combining�[0m
Process PoolWorker-1:
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-2:
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-3:
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-4:
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-5:
Traceback (most recent call last):
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
self.run()
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
self.run()
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
self.run()
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
self.run()
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
self._target(*self._args, **self._kwargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
self._target(*self._args, **self._kwargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
self._target(*self._args, **self._kwargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
self._target(*self._args, **self._kwargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
initializer(*initargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
initializer(*initargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
ZeroDivisionError: float division by zero
initializer(*initargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
initializer(*initargs)
initializer(*initargs)
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
ZeroDivisionError: float division by zero
Thank you for your help!
- Jacob
Thank you for opening this issue.
The message that you see does not mean that the phased SNPs are not passed along but it is just a warning that reminds the user that in the nonormal
version CHISEL cannot assess the presence of germline SNPs using a matched-normal sample because there is no matched-normal sample. Therefore, the user should be quite confident that all the provided SNPs are actually true heterozygous germline SNPs in the corresponding patient.
I am afraid that your errors are due to a different problem: an insufficient number of phased SNPs. In fact, whole-exome sequencing (WES) does not allow us to genotype the minimum number of SNPs that is required to accurately estimated BAF in individual cells (for related power calculations, please refer to the CHISEL manuscript); approximately, the number of heterozygous SNPs required is around 1M-1.6M (and roughly no less than 800k) to guarantee accuarate estimate of BAF in 3-5Mb genomic bins. Unfortunately, WES will only provide you with 1-3% of those in expectation.
The reccomended pipeline to apply the nonormal
version of CHISEL is actually to genotype SNPs directly from the tumour cells, i.e., from the input barcoded BAM file. This approach will not allow you to identify the SNPs in genomic regions affected by clonal LOH, but the nonormal version of CHISEL specifically implements a correction test to correctly identify these regions depleted of SNPs as correctly affected by LOH. Therefore, the reccomended pipeline for the nonormal
version is to use a simple BCFtools mpileup + call pipeline to call SNPs directly on the input barcoded BAM file with the option “--ignore-RG” flag (in mpileup, to make sure that cells are not treated independently) to call any possible single-nucleotide variant. Since you will use an imputation server, the variants that are likely somatic will be excluded, and only those that are likely germline will be kept because they will be found in the reference panel. You can use a command of this form to do this (ref: http://samtools.github.io/bcftools/howtos/variant-calling.html):
bcftools mpileup -Ou -f reference.fa barcodedcells.bam --ignore-RG | bcftools call -mv -Ov -o calls.vcf
You can also improve and speed-up this command by splitting it by chromosome in parallel using -r
and you could also provide list of known SNPs, or even better the list of all the SNPs included in the reference panel that you are going to use next. Also be careful with the reference genome requirements in the servers and the requested chr notation.
You can then use these vcf files to phase them through the imputation servers and then provide it in input to the CHISEL nonormal
version, proceeding with the standard CHISEL pipeline.
Please feel free to re-open the issue in case of further problems.