AlexsLemonade / refinebio-examples

Example workflows for refine.bio data

Home Page:https://www.refine.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about batch effects in refine.bio datasets

kengcher opened this issue · comments

commented

Hi!

I am trying to understand whether batch effects are corrected for in the refine.bio pipeline.

I downloaded the dataset GSE99039 (microarray) from refine.bio then looked at the dataset using PCA. I noticed that the dataset from refine.bio seem to have a clear separation that does not match any of the metadata.

refine.bio PCA
image

Hence would like to ask about
i. where is the part in the pipeline that does the (quantile?) normalization
ii. i understand that for the normalized data pipeline if any batch correction was performed.

Thank you.

Hi @kengcher,

Thanks for your questions and for using refine.bio. The dataset you mention (GSE99039) is submitter-processed, which means we were unable to process the data from raw files and use whatever values the authors submitted to GEO (in this case, it is reported to be RMA normalized values). We do quantile normalize submitter-processed data for delivery, but have less control over what happens prior to that step. We do not perform any batch correction (e.g., ComBat).

Looking at the description for this particular experiment, I would want to know if that separation corresponds to idiopathic PD vs. controls, but you do mention that the separation does not match any of the metadata in your post.

Hope this helps! Let me know if you have additional questions.

We've looked into why this particular experiment was not processed from raw and believe we may have identified a fix, which we will now need to test. If the fix works, we can expect to make the version of this experiment processed from raw within the next few weeks. We're in the middle of some infrastructure changes for the project, so we appreciate your patience!