theislab / scgen

Single cell perturbation prediction

Home Page:https://scgen.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: could not convert integer scalar

acmullen-med opened this issue · comments

I am trying to replicate the results of the perturbation experiment with some of my own data but am running into an error

>>> pred, delta = model.predict(
    ctrl_key='18h',
    stim_key='24h',
    celltype_to_predict='tbx16'
)
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/net/vol1/home/.local/lib/python3.7/site-packages/scgen/_scgen.py", line 155, in predict
    ctrl_adata.X = ctrl_adata.X.A
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/anndata/_core/anndata.py", line 684, in X
    self._adata_ref._X[oidx, vidx] = value
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/_index.py", line 116, in __setitem__
    self._set_arrayXarray_sparse(i, j, x)
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 808, in _set_arrayXarray_sparse
    self._zero_many(*self._swap((row, col)))
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 929, in _zero_many
    i, j, offsets)
ValueError: could not convert integer scalar

I found scverse/anndata#339 and tried

>>> model.adata = model.adata.copy()
>>> pred, delta = model.predict(
    ctrl_key='18h',
    stim_key='24h',
    celltype_to_predict='tbx16'
)

but got the same result.

I had no issues with training. Strangely, I don't seem to have this issue if I subset the data and only predict on 5% of data or if I use the sample data provided. Is there a size limit for the number of cells that this package will work with?

I am not using conda.

Relevant packages:
sc.version
'1.7.2'
scipy.version
'1.6.3'
np.version
'1.20.3'
anndata.version
'0.7.6'
scgen.version
'2.0.0'

I have tried to update my libraries to the most current versions. Any guidance you could provide would be great

hi @acmullen-med

There is no size limit with the model, it seems you have some cells in ctrl_adata.X = ctrl_adata.X.A which are causing this problem, have they been normalized together? ctrl_adata.X is sparse or dense? if dense please pass it as sparse, seems that you have a float conversion problem with scipy and nothing related to the model.

maybe try to convert, it seems you have integers in there maybe cast ctrl_adata.X beforehand to float and try.

and also maybe try to upgrade scipy

Hey @M0hammadL thanks for the response but I'm still having trouble getting the package working on larger datasets.

Looking through the package code I believe ctrl_adata.X is from adata that was added during initialization of the model and is sparse but can't be sure without altering the package code. The adata.X that goes into the model is a sparse csr matrix and adata.X.A is an ndarray of floats. I don't quite know what you mean by normalizing ctrl_adata.X and ctrl_adata.X.A together. Line 155 of _scgen.py seems to be trying to set them equal? What should be normalized?

Could you provide some clarity on what the differences are between ctrl_adata.X and ctrl_adata.X.A?

Even when I try to cast both adata.X and adata.X.A as sparse matrices, I get the same error.

>>> train = sc.read('/net/trapnell/vol1/home/acmullen/VAEs/data/fishVAEData.h5ad')
>>> adata = sc.AnnData(train)
>>> adata.X = adata.X.tocsr()
>>> adata.X.A = scipy.sparse.csr_matrix(adata.X.A)
>>> train_new = scgen.setup_anndata(adata,copy=True,batch_key="timepoint", labels_key="gene_target")
>>> model = scgen.SCGEN(train_new)
>>> scipy.sparse.issparse(model.adata.X)
True
>>> type(model.adata.X)
<class 'scipy.sparse.csr.csr_matrix'>
>>> type(model.adata.X.A)
<class 'numpy.ndarray'>

>>> model.train(
    max_epochs=100,
    batch_size=32,
    early_stopping=True,
    early_stopping_patience=25
)

>>> pred, delta = model.predict(
...     ctrl_key='18h',
...     stim_key='24h',
...     celltype_to_predict='tbx16'
... )
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/net/trapnell/vol1/home/acmullen/.local/lib/python3.7/site-packages/scgen/_scgen.py", line 155, in predict
    ctrl_adata.X = ctrl_adata.X.A
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/anndata/_core/anndata.py", line 684, in X
    self._adata_ref._X[oidx, vidx] = value
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/_index.py", line 116, in __setitem__
    self._set_arrayXarray_sparse(i, j, x)
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 808, in _set_arrayXarray_sparse
    self._zero_many(*self._swap((row, col)))
  File "/net/gs/vol3/software/modules-sw-python/3.7.7/scvi-tools/0.10.1/Linux/CentOS7/x86_64/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 929, in _zero_many
    i, j, offsets)
ValueError: could not convert integer scalar

If I try to recreate the ctrl_adata.X bug using my code I get this and it does not error.

>>> model.adata.X = model.adata.X.A
>>> model.adata.X
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], dtype=float32)
>>>

Additionally I found peculiar behavior below when generating the model. Why would setting up the model recast model.adata.X.A to no longer be sparse? Could this be related?

>>> adata = sc.AnnData(train)
>>> adata.X = adata.X.tocsr()
>>> adata.X.A = scipy.sparse.csr_matrix(adata.X.A)
>>>
>>> train_new = scgen.setup_anndata(adata,copy=True,batch_key="timepoint", labels_key="gene_target")
INFO     Using batches from adata.obs["timepoint"]
INFO     Using labels from adata.obs["gene_target"]
INFO     Using data from adata.X
INFO     Computing library size prior per batch
INFO     Successfully registered anndata object containing 186289 cells, 32031 vars, 2 batches, 3 labels, and 0 proteins.
         Also registered 0 extra categorical covariates and 0 extra continuous covariates.
INFO     Please do not further modify adata until model is trained.
>>> model = scgen.SCGEN(train_new)
>>>
>>> #Why are these both not sparse???
>>> scipy.sparse.issparse(adata.X.A)
True
>>> scipy.sparse.issparse(model.adata.X.A)
False

When I try to recast model.adata.X.A after initializing and training the model, the original error persists. So maybe not the problem?

Let me know if you have any other suggestions or need more information from me.

I have a resolution.

I added the two lines with Asterix to _scgen.py.

      eq = min(ctrl_x.X.shape[0], stim_x.X.shape[0])
       cd_ind = np.random.choice(range(ctrl_x.shape[0]), size=eq, replace=False)
       stim_ind = np.random.choice(range(stim_x.shape[0]), size=eq, replace=False)

       ctrl_adata = ctrl_x[cd_ind, :]
       stim_adata = stim_x[stim_ind, :]

154      **ctrl_adata = ctrl_adata.copy()
155      stim_adata= stim_adata.copy()**

       if sparse.issparse(ctrl_adata.X) and sparse.issparse(stim_adata.X):
           ctrl_adata.X = ctrl_adata.X.A
           stim_adata.X = stim_adata.X.A

When subsetting an adata frame a view is created instead of a new adata object. This is to save on memory link: https://anndata.readthedocs.io/en/latest/anndata.AnnData.html.

The adata.copy() forces the generation of a new adata object instead of a view.

I don't know why the downstream bug emerges when the ctrl_adata reaches a large enough threshold. That could be an issue with scvi-tools or scipy. But altering the package code fixed this issue for me. I would encourage you to add these two lines to the package or find an improved solution.

Thanks,

Hi @acmullen-med thanks for pointing this out! I have pushed a fixed here

should work now!