Adding data set using data from secondary source
twillis209 opened this issue · comments
First: thanks for creating and maintaining this package, it's a great help.
I would like to add a data set from Patel et al. 2014 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4123637/). This is a rather interesting one, comprising:
- 6 96-cell data sets from five different glioblastomas, with one tumour sequenced twice in distinct batches
- One 96-cell data set from one of those tumours sequenced using 100bp PE long reads
- 2 96-cell data sets from distinct gliomasphere cultures
- 5 population controls (1 for each tumour)
- 6 population libraries from cell lines derived from the tumours
So far so good, but the problem is that the authors of the original publication only made available log2(TPM+1) values on GEO, not raw counts. For my own purposes, I have been using counts generated by Risso et al. for their publication on ZINB-WaVE (https://www.nature.com/articles/s41467-017-02554-5). These data are currently hosted on a GitHub repo published by Risso for the sake of reproduction of their work on ZINB-Wave. I call them 'secondary' in the sense that they do not originate from the original publication by Patel et al.
Would you accept a pull request adding these count data to the package?
Funny you say that, because the original purpose of this package was... to serve up count matrices generated by @drisso! Note the ReprocessedAllenData()
and friends - I'm sure one could fit in another one of the same nature.
Excellent. Davide was kind enough to provide these counts on request in the summer, so I will endeavour to pay that forward.