theislab / scgen

Single cell perturbation prediction

Home Page:https://scgen.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions on batch removal tutorial

Pazuzzilla opened this issue · comments

Hi,
I'm running the batch removal tutorial provided in:
https://scgen.readthedocs.io/en/latest/tutorials/scgen_batch_removal.html

My interest is the feature of you're software of producing a corrected expression matrix after the batch removal.
I need some clarifications before moving on my dataset, in the dataset you provided pancreas.h5ad which was load in python as train, i can found:

>>> train AnnData object with n_obs × n_vars = 2448 × 14693 obs: 'n_cells-0', 'n_cells-1', 'n_cells-2', 'n_cells-3' var: 'celltype', 'sample', 'n_genes', 'batch', 'n_counts', 'louvain' uns: 'celltype_colors', 'louvain', 'neighbors', 'pca', 'sample_colors' obsm: 'PCs' varm: 'X_pca', 'X_umap' varp: 'distances', 'connectivities'

i usually don't work with AnnData object, but if i understand well we have 2448 gene expression values over 14693 cells.
In the same object i have:

>>>train.raw.X <14693x24516 sparse matrix of type '<class 'numpy.float32'>' with 55503411 stored elements in Compressed Sparse Row format>

here we have 24516 genes expression value for the 14693 cells.
after the step

corrected_adata = model.batch_removal()

we have the same situation but in corrected_adata.X i have different values with respect to train.X .
So I assume a subsample of the genes was made in the starting dataset and the corrected expression matrix is the one i found in corrected_adata.X, i wonder if this filtering was done for reduce the computational weight only in the tutorial, retaining a subset of significant genes, or because a preprocessing step of this kind is mandatory.

Sorry if it's trivial, but i was not clear to me.
As supplementary comment i want to tell you about the code in the preprocessing step

train = scgen.setup_anndata(train, batch_key="batch", labels_key="cell_type", copy=True)
i obtain the error

Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'scgen' has no attribute 'setup_anndata'

using instead:
train = scgen.SCGEN.setup_anndata(train, batch_key="batch", labels_key="cell_type", copy=True)
i have no error.

It is the correct way to do?

Hi, Pazuzzilla
I have the same interest with you! But I couldn't install scGEN smoothly.
I have tried many methods, and the obtained corrected expression matrix is gene filtered. Can you tell me if the genes in the expression matrix obtained by this tool are also filtered?
Thanks a lot of your help

Since i dind't had a reply i can only express my impression, the results are filtered but i still don't undertand under which criteria, since i dind't specify for example an amount of high variable gene to retain. It is not very clear from the example. Also i didn't proceed with this tool at the time, but.. i 'm probably going to use it in a bit, if i will have more information using it i will update this issue with new informations