raphael-group / decifer

DeCiFer is an algorithm that simultaneously selects mutation multiplicities and clusters SNVs by their corresponding descendant cell fractions (DCF).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Skipping mutations warning and list index out if range error

firatuyulur opened this issue · comments

Hi,
I'm tyring to run decifer on some TRACERx 100 data but I either lose some mutations or get the list index out of range error.

The sample that I am losing mutations but still get a result is CRUK0036. Here is what the first few lines look like in the stdout

/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 4: State tree file does not contain state trees for the set of copy-number states that affect mutation 4.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 5: State tree file does not contain state trees for the set of copy-number states that affect mutation 5.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 6: State tree file does not contain state trees for the set of copy-number states that affect mutation 6.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 7: State tree file does not contain state trees for the set of copy-number states that affect mutation 7.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 8: State tree file does not contain state trees for the set of copy-number states that affect mutation 8.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 9: State tree file does not contain state trees for the set of copy-number states that affect mutation 9.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 10: State tree file does not contain state trees for the set of copy-number states that affect mutation 10.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 11: State tree file does not contain state trees for the set of copy-number states that affect mutation 11.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)
/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/mutation.py:158: UserWarning: Skipping mutation 12: State tree file does not contain state trees for the set of copy-number states that affect mutation 12.
 To generate state trees, see documentation for `generatestatetrees`, included in the C++ component of DeCiFer.
  warnings.warn(msg)

I have around 250 input SNVs for CRUK0036, yet decifer results have about 140 snvs. Do you know why this happens?

For CRUK0039, I once again receive many warning lines just like CRUK0036 and it fails to produce a result with the following error.

Progress: |------------------------------| 0.0% Complete [[2021-Sep-08 15:11:03]Started]Traceback (most recent call last):
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/bin/decifer", line 11, in <module>
    sys.exit(main())
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/decifer.py", line 52, in main
    run_coordinator_iterative(mutations, num_samples, purity, args, record if args['record'] else None)
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/decifer.py", line 138, in run_coordinator_iterative
    map(report, pool.imap_unordered(run_descent, jobs))
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/multiprocessing/pool.py", line 673, in next
    raise value
Exception: Traceback (most recent call last):
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/decifer.py", line 241, in run_descent
    C, best_mutations, mut_cluster_assignments, mut_config_assignments, obj, it = coordinate_descent(x, seed, mutations, num_samples, k, maxit, record, betabinomial, purity)
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 41, in coordinate_descent
    mutations, obj = optimize_assignments(mutations, C, num_samples, num_clusters, bb)
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 86, in optimize_assignments
    return mutations, sum(map(lambda m : update(m, select(m)), mutations))
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 86, in <lambda>
    return mutations, sum(map(lambda m : update(m, select(m)), mutations))
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 71, in select
    objs = -np.sum(np.array([grow(sample) for sample in xrange(num_samples)]), axis=0)
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 70, in <lambda>
    grow = (lambda sample : compute_pdfs(*zip(*[form(cb[0], cb[1], sample) for cb in combs])))
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/new_coordinate_ascent.py", line 67, in <lambda>
    form = (lambda cf, cl, sam : (cf.cf_to_v(C[sam][cl], sam), m.a[sam]+1, (m.d[sam] - m.a[sam]) + 1))
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/config.py", line 90, in cf_to_v
    if not (self.cf_bounds(sample)[0] - THRESHOLD <= c <= self.cf_bounds(sample)[1] + THRESHOLD):
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/config.py", line 38, in cf_bounds
    return self.d_bounds(sample)
  File "/home/f.uyulur/miniconda3/envs/for_decifer_v1-1-4/lib/python2.7/site-packages/decifer/config.py", line 48, in d_bounds
    return M, M + self.cn_props[self.mut_state[:2]][sample]
IndexError: list index out of range

I've attached the input files for each sample as well as the missing SNVs for CRUK0036 in case it is helpful.

Thanks
Firat

CRUK0036_missing_snvs.txt
CRUK0036.zip
CRUK0039.zip

The warnings are indicating mutations that are excluded because harborer in genomic regions for which state trees are not available in the default "state tree" file. In order to fix this, you should generate new state trees according to the copy number states that you have in your input. Please follow the new instructions and scripts from @brian-arnold to achieve this at:

https://github.com/raphael-group/decifer#optionaldata

Please let us know if you have any further issue.

In the 'scripts' subdirectory, we've added a new script to generate the input file for decifer using SNV data (in standard VCF format) and CNA data (from e.g. HATCHet). As we describe, this script also generates a file called cn_states.txt that may be given directly to the generatestatetrees function on the command line to generate the state trees observed in your specific data.

Please note that this function may take a very long time to complete if the total copy number is high for a particular clone (e.g. total copy number > 6). However, even if you are able to get the state trees for them, estimates of the CCF/DCF for SNVs in a region with very high total copy number may be much less inaccurate, since there's more uncertainty in what the VAF -> CCF/DCF transformation is (as described in the manuscript) for these sites.