Oshlack / splatter

Simple simulation of single-cell RNA sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simulating batch effects

HelloWorldLTY opened this issue · comments

Hi, I have a specific requirement of using splatter.

I have a dataset with m batches and n cell types, and I hope to simulate a same dataset with m batches and n cell types, same as scDesign3 (https://www.nature.com/articles/s41587-023-01772-1) did.

How to set parameters for my case? I cannot find tutorial for this case. Directly use the estimate gives me the results for one batch and one group. Can I tell the estimation process the name of my batch column and cell type column? I intend to keep the original cell-type information as well as the cell-batch-cell type matching relation.

Thanks a lot.

I think one appraoch is to split the whole dataset into different batch by celltype file, and then combine all the simulation results. Do you think it is a proper plan? Thanks.

Batch effect parameters for the Splat simulation are not estimated from the data and need to be provided by the user. See the "Batch effects" section of the parameters vignette for more details.

Splat simulations completely independent of each other so your suggested approach is definitely NOT recommended.

Thanks. So based on my request (Say, I need to simulate multi-batch data with multi-cell-type-label with splatter based on a real dataset), do you have other suggestions? Thanks.

I used to think about some designs. All of the designs are based on parameters from spatialEstimate.

My first approach is to split dataset into different batches, and use the library_{subbatch}.loc / library_{total}.loc as df.facBatchloc, etc. And set the proprotion of cell types based on the cell type distribution of the total dataset. But here I still cannot simualte the celltype similarity since I cannot obtain de.fac etc. parameters from the raw dataset. Moreover, each batch may have its specific cell types.

Another plan is to split dataset by cell type, but since you mentioned the independent case, I think I cannot obtain meaningful parameter settings from this design.

While the Splat simulation estimates several parameters from a dataset it is not designed to completely reproduce a real dataset (unlike some other models). Instead, it allows you to flexibly design a scenario for testing whatever you are trying to evaluate. It may be possible to get something close to what you want using a combination of the batch effect and DE parameters but it will require manually specifying those.

Closing this issue but please comment if you have further questions.