lilab-bcb / pegasus

A tool for analyzing trascriptomes of millions of single cells.

Home Page:https://pegasus.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: fill value must be in categories

swemeshy opened this issue · comments

When I use pg.aggregate_matrices to aggregate two H5AD files that have been preprocessed with Pegasus, I get the following error:

>>> pg.aggregate_matrices('aggregate_matrices.csv')
2021-02-03 01:54:28,996 - pegasusio.readwrite - INFO - h5ad file '/projects/ascites/2020-11-06_analysis_Pilot1/data/preprocessed_filtered_singlets.h5ad' is loaded.
2021-02-03 01:54:28,996 - pegasusio.readwrite - INFO - Function 'read_input' finished in 1.10s.
2021-02-03 01:54:30,306 - pegasusio.readwrite - INFO - h5ad file '/projects/ascites/2020-11-25_analysis_Pilot2/data/preprocessed_filtered_singlets.h5ad' is loaded.
2021-02-03 01:54:30,306 - pegasusio.readwrite - INFO - Function 'read_input' finished in 1.31s.
2021-02-03 01:54:30,466 - pegasusio.qc_utils - INFO - After filtration, 10302 out of 10302 cell barcodes are kept in UnimodalData object GRCh38-rna.
2021-02-03 01:54:30,559 - pegasusio.qc_utils - INFO - After filtration, 14836 out of 14836 cell barcodes are kept in UnimodalData object GRCh38-rna.
Traceback (most recent call last):
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-83d8f3ba4acc>", line 1, in <module>
    pg.aggregate_matrices('aggregate_matrices.csv')
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/decorators.py", line 12, in wrapper_timer
    result = func(*args, **kwargs)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/data_aggregation.py", line 216, in aggregate_matrices
    aggregated_data = aggrData.aggregate()
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/decorators.py", line 12, in wrapper_timer
    result = func(*args, **kwargs)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/aggr_data.py", line 200, in aggregate
    unidata = self._aggregate_unidata(self.aggr.pop(key))
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/decorators.py", line 30, in wrapper_run_gc
    result = func(*args, **kwargs)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pegasusio/aggr_data.py", line 138, in _aggregate_unidata
    barcode_metadata.fillna(value=fillna_dict, inplace=True)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 4321, in fillna
    return super().fillna(
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 6078, in fillna
    obj.fillna(v, limit=limit, inplace=True, downcast=downcast)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4530, in fillna
    return super().fillna(
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 6061, in fillna
    new_data = self._mgr.fillna(
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 594, in fillna
    return self.apply(
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 409, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1779, in fillna
    values = values.fillna(value=value, limit=limit)
  File "/home/sramesh/software/miniconda3/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 1721, in fillna
    raise ValueError("fill value must be in categories")
ValueError: fill value must be in categories

I'm not sure how to fix this error. Can you please help?

I'm using pegasuspy version 1.2.0 and pegasusio version 0.2.9.

Thanks!

I see that you were aggregating files in h5ad format, which stores string-type attributes in category type. This can be possible cause of the error, as the default NA fill value for categorical variable is "", which probably doesn't exist unless being added as a new level to that categorical attribute.

I'll see what the best way of handling this situation, fix it, and let you know in this thread.

commented

Hi @swemeshy,

We should have fixed this issue. Can you try again using Pegasus v1.4.4 and PegasusIO v0.4.1?

Best,
Bo