lilab-bcb / pegasus

A tool for analyzing trascriptomes of millions of single cells.

Home Page:https://pegasus.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyError: 'n_genes'

bschilder opened this issue · comments

Hello, is there another step I'm missing? i was expecting qc_metrics to create an n_genes col in .obs

# Convert anndata and downsample cells
data = io.MultimodalData(adat[sample(range(adat.shape[0]), k=10000), :]) 
data.obs["Channel"] = data.obs.study

# Run QC
pg.qc_metrics(data, percent_mito=10)

Error message

Trying to set attribute `.obs` of view, copying.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/pegasus/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'n_genes'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-40-d6ddbdb7cebc> in <module>
----> 1 pg.qc_metrics(data, percent_mito=10)

~/anaconda3/envs/pegasus/lib/python3.8/site-packages/pegasus/tools/preprocessing.py in qc_metrics(data, select_singlets, remap_string, subset_string, min_genes, max_genes, min_umis, max_umis, mito_prefix, percent_mito)
     80         min_umis = 1
     81 
---> 82     calc_qc_filters(data, select_singlets = select_singlets, remap_string = remap_string, subset_string = subset_string, min_genes = min_genes, max_genes = max_genes, min_umis = min_umis, max_umis = max_umis, mito_prefix = mito_prefix, percent_mito = percent_mito)
     83 
     84 

~/anaconda3/envs/pegasus/lib/python3.8/site-packages/pegasusio/qc_utils.py in calc_qc_filters(unidata, select_singlets, remap_string, subset_string, min_genes, max_genes, min_umis, max_umis, mito_prefix, percent_mito)
    121             unidata.obs["n_genes"] = unidata.X.getnnz(axis=1)
    122         if min_cond:
--> 123             filters.append(unidata.obs["n_genes"] >= min_genes)
    124         if max_cond:
    125             filters.append(unidata.obs["n_genes"] < max_genes)

~/anaconda3/envs/pegasus/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~/anaconda3/envs/pegasus/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'n_genes'

Versions

print(pg.__version__)
print(ad.__version__)
1.2.0
0.7.6

If I manually insert an n_genes column, I instead get the same error message as before but for n_counts. And then once adding n_counts, i get the same for percent_mito. Then n_cells

I can confirm that none of the pre-existing columns in my .obs or .var had those same names (in case it was a duplicate col name issue).

Hello @bschilder , I was able to reproduce your error. This is because your MultimodalData object was created from a view of AnnData object (notice that slicing on AnnData object returns a view of AnnData), instead of the AnnData object itself. And in this case, even manually adding attributes to obs field would also fail. (You can check that your modification on Channel attribute failed)

To fix it, please add .copy() to enforce creating a new AnnData object after slicing:

data = io.MultimodalData(adat[sample(range(adat.shape[0]), k=10000), :].copy()) 
data.obs["Channel"] = data.obs.study

# Run QC
pg.qc_metrics(data, percent_mito=10)

omg, yes that's absolutely it. that's such a simple thing, apologies for not realizing it sooner. though I'm kind of surprised io.MultimodalData() accepted theanndata View. Perhaps you could add a check to see if the anndata is a view when reading it in via this function, and if so, warn users it won't work (in cases where they try to do silly things like i did)?

Many thanks!

Yeah, this is a good idea to give users the warnings regarding this kind of situations. Thank you for your suggestion!