data normalization

Question

data normalization

doulijun777 opened this issue a year ago · comments

Thanks for this wonderful tool. I have one question: when I simulated Psedobulk data using sc-RNAseq data, I just check the function "generate_simulated_data", it looks as

print('Normalizing raw single cell data with scanpy.pp.normalize_total')
sc_data = anndata.AnnData(sc_data)

sc.pp.normalize_total(sc_data, target_sum=1e4)

So, do we need to normalize here or not? I am little confused?

Another question is that for bulk data, do we need to change to TPM or FPKM or only use the count data.

Thank you.

doulijun777 · Answer 1 · Wed Aug 16 2023 21:50:46 GMT+0800 (China Standard Time)

In the published code, this sentence was commented out, actually. so I am little confused.

Yanshuo Chen · Answer 2 · Sat Aug 26 2023 08:12:44 GMT+0800 (China Standard Time)

Hi doulijun777:

Thanks for trying TAPE. Actually, I am not very sure about the normalization problem right now. Probably I should not commentated it out. For the bulk data, whatever the normalization is, please use "count" argument in the function to make sure the proper deconvolution performance.

Regards,
Yanshuo