LTLA / scRNAseq

Clone of the Bioconductor repository for the scRNAseq package.

Home Page:http://bioconductor.org/packages/devel/data/experiment/html/scRNAseq.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding data set using data from secondary source

twillis209 opened this issue · comments

First: thanks for creating and maintaining this package, it's a great help.

I would like to add a data set from Patel et al. 2014 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4123637/). This is a rather interesting one, comprising:

  • 6 96-cell data sets from five different glioblastomas, with one tumour sequenced twice in distinct batches
  • One 96-cell data set from one of those tumours sequenced using 100bp PE long reads
  • 2 96-cell data sets from distinct gliomasphere cultures
  • 5 population controls (1 for each tumour)
  • 6 population libraries from cell lines derived from the tumours

So far so good, but the problem is that the authors of the original publication only made available log2(TPM+1) values on GEO, not raw counts. For my own purposes, I have been using counts generated by Risso et al. for their publication on ZINB-WaVE (https://www.nature.com/articles/s41467-017-02554-5). These data are currently hosted on a GitHub repo published by Risso for the sake of reproduction of their work on ZINB-Wave. I call them 'secondary' in the sense that they do not originate from the original publication by Patel et al.

Would you accept a pull request adding these count data to the package?

Funny you say that, because the original purpose of this package was... to serve up count matrices generated by @drisso! Note the ReprocessedAllenData() and friends - I'm sure one could fit in another one of the same nature.

Excellent. Davide was kind enough to provide these counts on request in the summer, so I will endeavour to pay that forward.