poseidonchan / TAPE

Deep learning-based tissue compositions and cell-type-specific gene expression analysis with tissue-adaptive autoencoder (TAPE)

Home Page:https://sctape.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data normalization

doulijun777 opened this issue · comments

Thanks for this wonderful tool. I have one question: when I simulated Psedobulk data using sc-RNAseq data, I just check the function "generate_simulated_data", it looks as

print('Normalizing raw single cell data with scanpy.pp.normalize_total')
sc_data = anndata.AnnData(sc_data)

sc.pp.normalize_total(sc_data, target_sum=1e4)

So, do we need to normalize here or not? I am little confused?

Another question is that for bulk data, do we need to change to TPM or FPKM or only use the count data.

Thank you.

In the published code, this sentence was commented out, actually. so I am little confused.

Hi doulijun777:

Thanks for trying TAPE. Actually, I am not very sure about the normalization problem right now. Probably I should not commentated it out. For the bulk data, whatever the normalization is, please use "count" argument in the function to make sure the proper deconvolution performance.

Regards,
Yanshuo