merging independent SuperCell runs

Question

merging independent SuperCell runs

daskelly opened this issue 2 years ago · comments

Hi, thanks for developing this very nice package!

I have a question about merging SuperCell objects. Let's say I have three samples and I wish to run SuperCell separately on each sample -- e.g. to ensure that metacells are only composed of cells from the same biological specimen. Is there a way to merge these multiple SuperCell objects so that I can then run the weighted PCA and clustering on the combined data? Thanks for any tips!

Mariia Bilous · Answer 1 · Wed Apr 06 2022 20:23:58 GMT+0800 (China Standard Time)

Hi @daskelly,

Thanks a lot for you interest in our method!

There are two option available to address your question:

Run SuperCell on your 3 samples specifying that there are 3 different samples and that the metacells (super-cells) should not contain single cells from different samples. This can be done with the parameter cell.split.condition in SCimplify().
Run SuperCell on your separate samples and merge the results using supercell_merge() function that I just added thanks to your question. Please, see the example in the function description.

Which approach to use highly depends on what you would do at the single-cell level. For instance, if you would analyze your samples together, you can go with the first approach.
If your samples are very big, you can use the second approach that will save you some time and memory, as each sample will be processed individually. Please, note, that in the first approach, the set of features to build metacells will be the same for all the samples, while at the second one, if you don't provide your set of features (genes.use parameter in SCimplify()), each sample will be processed (i.e., metacells will be built) with its own set of highly variable genes.

Please, let me know if this answers your question and don't hesitate to contact me if you have any other questions or suggestions!

Bets,
Mariia

Dan Skelly · Answer 2 · Thu Apr 07 2022 07:21:59 GMT+0800 (China Standard Time)

Hi @mariiabilous thank you! This is really helpful and it does answer my question.

Can I ask two follow-up questions?

Suppose I build the metacells separately on each sample using the same granularity and same highly variable genes, then merge them with supercell_merge(). Is this result going to be equivalent to running on all samples simultaneously and using the parameter cell.split.condition as you suggest (with the same granularity and HVGs)?
Would there be anything strange about using different granularities on different samples, or is this a normal thing to do? I am thinking it might be natural to use a lower granularity on a sample with fewer cells, and a higher granularity on a sample with more cells (assuming the same tissue).

Thanks for your responsiveness!

Mariia Bilous · Answer 3 · Thu Apr 07 2022 21:11:10 GMT+0800 (China Standard Time)

Sure!

Suppose I build the metacells separately on each sample using the same granularity and same highly variable genes, then merge them with supercell_merge(). Is this result going to be equivalent to running on all samples simultaneously and using the parameter cell.split.condition as you suggest (with the same granularity and HVGs)?

The short answer is "No".
Processing samples separately based on the same set of HVG would still result in different dimensionality reduction embedding. Namely, PCA of each separate sample is different from the PCA of all samples merged together, as the first one would be driven by the heterogeneity of a particular sample and the second one by the overall heterogeneity (and possibly some technical variability among samples). Since SuperCell does dimensionality reduction to build metacells, this will result in different metacell partitions.
Please, see the brief example showing that the metacell partition is different when doing an independent construction (supercell_merge()) and a combined approach (SCimplify() for all samples together specifying cell.split.condition parameter).

I expect, that metacells built with an independent approach would be more 'stable' as they are based on the intra-sample heterogeneity.

Note, that in the case of a combined approach (all samples together, specifying cell.split.condition parameter), the actual graining level represents the overall granularity and might be different for each sample. For instance, if your actual grading level is 20, it might be that one sample s1 has granularity 15 (i.e., average metacell size is 15), another sample s2 has granularity 22, etc.
While the independent construction (using supercell_merge()) will guarantee a requested graining level for all samples.

Would there be anything strange about using different granularities on different samples, or is this a normal thing to do? I am thinking it might be natural to use a lower granularity on a sample with fewer cells, and a higher granularity on a sample with more cells (assuming the same tissue).

It can happen in the combined analysis, different samples have different granularity due to their different complexity and heterogeneity. I think it is acceptable to process samples of different sizes at different graining levels, as long as the size distribution of metacells you are going to combine in the same analyses is acceptable. I wouldn't go with gamma = 10 and gamma = 100 in the same analysis.

Thank you a lot for your interesting questions! If you try different approaches, I would be happy to know your experience and your thought on which approach was more appropriate in the analyses you performed.

Dan Skelly · Answer 4 · Thu Apr 07 2022 22:43:26 GMT+0800 (China Standard Time)

Thank you @mariiabilous this is helpful and makes a lot of sense!