raphael-group / chisel

CHISEL -- Copy-number Haplotype Inference in Single-cell by Evolutionary Links

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using phased SNPs with chisel_nonormal

jberg1999 opened this issue · comments

Hi Chisel developers!

I am running chisel_nonormal with phased SNPs from matching germline WES data. However, it looks like the chisel_nonormal pipeline does not seem to use these for BAF estimation and I am not sure if this is the desired behavior.

Line 198 of chisel_nonormal.py calls bafEstimator in the following way that does not include the simulated normal:

cmd = cmd.format(get_comp('BAFEstimator.py'), args['tumor'], args['reference'], args['jobs'], lcel, args['listphased'])

In BAFEstimator.py not having a normal sample triggers the following code block.

elif args.normal is None:
        log('A matched-normal sample has not been provided and the presence of the provided heterozygous germline SNPs will not be correspondigly assessed.', level='WARN')

This also seems to cause the following set of errors in my case. I have attached a log file for the run, but I include the main parts below. Also, the resulting baf.tsv is empty.

Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 326, in <module>
    main()
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 104, in main
    snps = selecting(args, phased)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/BAFEstimator.py", line 170, in selecting
    pool = Pool(processes=min(args['J'], len(phased)), initializer=init_selecting, initargs=initargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 154, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
�[4m�[96m[2023-Mar-23 22:21:09]Combining RDRs and BAFs�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Parsing and checking arguments�[0m
�[92m[2023-Mar-23 22:21:23]Arguments:
	maxerror : None
	restarts : 100
	bootstrap : 100
	gccorr : /michorlab/jacobg/wes/ref_files/hg38/bwa_gdc/GRCh38.d1.vd1.CIDC.fa
	minerror : 0.001
	rdr : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/rdr/rdr.tsv
	blocksize : 50000
	j : 32
	baf : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/baf/baf.tsv
	phasecorr : True
	seed : 12
	listofcells : /michorlab/jacobg/scDNA/chisel_results/TN1/nonormal/rdr/total.tsv
	significance : 0.05
	missingsnps : (10, 0)
	minimumsnps : 0.08
	alphagc : 0.05�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Read list of cells�[0m
�[1m�[95m[2023-Mar-23 22:21:23]Reading RDR�[0m
�[1m�[95m[2023-Mar-23 22:21:27]Reading BAF�[0m
�[1m�[95m[2023-Mar-23 22:21:27]Combining�[0m
Process PoolWorker-1:
Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-2:
Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-3:
Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-4:
Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Process PoolWorker-5:
Traceback (most recent call last):
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
    self.run()
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
    self.run()
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
    self.run()
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
    self.run()
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
    self._target(*self._args, **self._kwargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
    self._target(*self._args, **self._kwargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
    self._target(*self._args, **self._kwargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
    self._target(*self._args, **self._kwargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 97, in worker
    initializer(*initargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
    initializer(*initargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
    alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
    alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
ZeroDivisionError: float division by zero
    initializer(*initargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
    initializer(*initargs)
    initializer(*initargs)
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
  File "/michorlab/jacobg/miniconda3/envs/chisel/lib/python2.7/site-packages/chisel/Combiner.py", line 234, in init
    alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
    alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
    alpha_correct = args['significance'] / (20 * float(sum(len(snps[c]) for c in snps)))
ZeroDivisionError: float division by zero
ZeroDivisionError: float division by zero

error.txt

Thank you for your help!

  • Jacob

Thank you for opening this issue.

The message that you see does not mean that the phased SNPs are not passed along but it is just a warning that reminds the user that in the nonormal version CHISEL cannot assess the presence of germline SNPs using a matched-normal sample because there is no matched-normal sample. Therefore, the user should be quite confident that all the provided SNPs are actually true heterozygous germline SNPs in the corresponding patient.

I am afraid that your errors are due to a different problem: an insufficient number of phased SNPs. In fact, whole-exome sequencing (WES) does not allow us to genotype the minimum number of SNPs that is required to accurately estimated BAF in individual cells (for related power calculations, please refer to the CHISEL manuscript); approximately, the number of heterozygous SNPs required is around 1M-1.6M (and roughly no less than 800k) to guarantee accuarate estimate of BAF in 3-5Mb genomic bins. Unfortunately, WES will only provide you with 1-3% of those in expectation.

The reccomended pipeline to apply the nonormal version of CHISEL is actually to genotype SNPs directly from the tumour cells, i.e., from the input barcoded BAM file. This approach will not allow you to identify the SNPs in genomic regions affected by clonal LOH, but the nonormal version of CHISEL specifically implements a correction test to correctly identify these regions depleted of SNPs as correctly affected by LOH. Therefore, the reccomended pipeline for the nonormal version is to use a simple BCFtools mpileup + call pipeline to call SNPs directly on the input barcoded BAM file with the option “--ignore-RG” flag (in mpileup, to make sure that cells are not treated independently) to call any possible single-nucleotide variant. Since you will use an imputation server, the variants that are likely somatic will be excluded, and only those that are likely germline will be kept because they will be found in the reference panel. You can use a command of this form to do this (ref: http://samtools.github.io/bcftools/howtos/variant-calling.html):

bcftools mpileup -Ou -f reference.fa barcodedcells.bam --ignore-RG | bcftools call -mv -Ov -o calls.vcf

You can also improve and speed-up this command by splitting it by chromosome in parallel using -r and you could also provide list of known SNPs, or even better the list of all the SNPs included in the reference panel that you are going to use next. Also be careful with the reference genome requirements in the servers and the requested chr notation.

You can then use these vcf files to phase them through the imputation servers and then provide it in input to the CHISEL nonormal version, proceeding with the standard CHISEL pipeline.

Please feel free to re-open the issue in case of further problems.