scanorama bbknn

Question

scanorama bbknn

wangjiawen2013 opened this issue 6 years ago · comments

Dear,
Scanorama handles the mutual nearest neighbors-based matching, batch correction, and panorama assembly. I have not find assembly function in pancreas-4-Scanorama.ipynb. what's the corresponding function of scanorama's assembly function in bbknn (or scanpy)?

Krzysztof Polanski · Answer 1 · Mon Oct 01 2018 23:54:49 GMT+0800 (China Standard Time)

I have no idea what you're asking. BBKNN/scanpy don't assemble panoramas. BBKNN's output is a batch-balanced graph (which can be used for UMAP, clustering and so on), it does not currently correct the expression space in any way.

Assuming you're actually asking about how scanorama works, scanorama's data correction is performed by this little timed chunk of code that needs to be written out to bin/4panc.py as per notebook instructions:

t1 = time.time()
datasets, genes = correct(datasets, genes_list)
datasets = [ normalize(ds, axis=1) for ds in datasets ]
t2 = time.time()

This creates a corrected expression space, which is dumped out by save_datasets(datasets, genes, data_names). This is subsequently imported into the notebook and processed in a manner consistent with the other analyses.

jiawen wang · Answer 2 · Mon Oct 08 2018 16:44:23 GMT+0800 (China Standard Time)

According to scanorama's document (scanorama.py), the datesets have already been normalized when executing "datasets, genes = correct(datasets, genes_list)". The normalize() function is included in correct() function, why is it executed again here ? Are there any particular purposes ?

In pancreas-4-Scanorama.ipynb, the corrected datasets have not been processed with sc.pp.log1p() and
sc.pp.normalize_per_cell (), and high variable genes have not been identified with sc.pp.filter_genes_dispersion(). All of them are necessary in routine scanpy pipeline. Are these process could be skipped when treating scanorama-corrected datasets ?

Krzysztof Polanski · Answer 3 · Mon Oct 08 2018 17:20:26 GMT+0800 (China Standard Time)

Notice how scanorama outputs a filtered gene space and altered expression. At the time I cloned scanorama (31.07), a number of scripts in bin/ started like this:

if __name__ == '__main__':
	datasets, genes_list, n_cells = load_names(data_names)
	datasets, genes = correct(datasets, genes_list)
	datasets = [ normalize(ds, axis=1) for ds in datasets ]
	datasets_dimred = dimensionality_reduce(datasets)

I am quite busy at the moment and cannot promise any further assistance in a timely manner.