lilab-bcb / pegasus

A tool for analyzing trascriptomes of millions of single cells.

Home Page:https://pegasus.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent z-score calculation for CSC and CSR sparse matrices

jkanche opened this issue · comments

The z-score calculation in Pegasus produces inconsistent results based on the type of input matrix (CSC or CSR). The following code snippet demonstrates the issue:

import numpy as np
import pandas as pd
import anndata as ad
from scipy.sparse import csr_matrix

counts = csr_matrix(np.random.poisson(1, size=(100, 2000)), dtype=np.float32)

adata_csr = ad.AnnData(counts)
adata_csc = ad.AnnData(counts.tocsc())

import pegasus

print(pegasus.__version__)
# 1.9.0

np.allclose(pegasus.calculate_z_score(adata_csc), pegasus.calculate_z_score(adata_csr))
# 2024-02-12 14:41:34,223 - pegasus.tools.signature_score - WARNING - Detected and dropped duplicate bin edges!
# 2024-02-12 14:41:34,245 - pegasus.tools.signature_score - WARNING - Detected and dropped duplicate bin edges!
# False

Upon investigation, the root cause seems to be in this line. The code should check the orientation of the sparse matrix before computing mean and standard deviation. Coercing the matrix to CSR format might be necessary, If I understand the sparse functions here correctly.

Environment:

  • Pegasus version: 1.9.0
  • Python version: python 3.11

Hi @jkanche . PR #290 should fix this issue.

In brief, in both calc_mean and calc_sig_background functions that are called by calculate_z_score, the count matrix needs to be converted into csr_matrix if not.

I believe this effects most of your other functions like calc_mean. Especially those that usually accept indices, indptr and data and all upstream methods that call these functions.

Yes, I also see that. I may add this guard to them as well.

Specifically for your case, please let me know if the issue still persists.

The fix for this issue is released in version 1.9.1 (https://pegasus.readthedocs.io/en/stable/#march-16-2024). I'll close this issue, but feel free to reopen it if it persists.