greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine

Home Page:https://greenelab.github.io/deep-review/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Who is ready to start writing?

agitter opened this issue · comments

Our new goal is to have the review ready to submit by December 1, so we need to start writing now. There are many people watching this repository. If any of you have not contributed yet but want to start, there is still time to join in.

I'd like to estimate how many contributors intend to actively participate in the writing phase and their areas of interest. Below I've enumerated everyone who has been involved in the issues so far. Please comment in this issue to let us know 1) if you intend to help write the manuscript during the next few weeks and 2) what sub-sections you want to work on.

Others who have joined the discussion on specific issues but did not seem interested in writing:

  • cangermueller
  • trangptm
  • in4matx
  • yil8
  • ueser

Edited to reflect responses

@agitter I'd love to help contribute to the writing of the review. I see that you have an rough outline of the different parts in the "sections" folder. Can you describe in a bit more what you envision each of these sections to contain? (Apologies if this was explained in more detail elsewhere, I could not find.)

commented

Hi Anthony, as I mentioned in #95 I would like to contribute to the discussion about the current limitations of Deep Learning. Also, I would like to contribute to the section about the current DL efforts in the single-cell technologies.

I don't have the bandwidth to contribute writing but I'd be happy to read a draft and help edit.

commented

Hi,

  1. I'll like to contribute in manuscript write-up.
  2. subsections may include: drugs/cheminformatics, protein structure and studies including binding of one molecule with the other (e.g. protien/lncRNA/DNA/compound etc.)

I'd like to contribute as well, but as @traversc said it would be nice to first discuss how sections and subsections should look like, or maybe which paper to include where.

@traversc @gokceneraslan You're right, we've haven't finished defining the sections and sub-sections yet. @cgreene and I will be working on adding an outline and prompts to the placeholder .md files. See pull request #108 for an example of what these will look like after more details are added.

In the meantime, there are several things we could use help with immediately:

  • Discussing what the sections and sub-sections should be. We can initially use #88 for this and then move to pull requests on the .md files as we start to refine them.
  • Making sure at least one of us has looked at each paper listed in the issues to see if they need a full summary or are out of scope. We want to focus on the guiding question #88 as opposed to every cool neural network paper in biology, though the study topic provides a lot of freedom.

@kumardeep27 I'll probably work on the cheminformatics section with you, but I don't know some of the other molecule-molecule binding papers as well. Can you please scan the open issues to see if there are any in that area (e.g. #30) that should be summarized or closed?

@w9 Are there any single-cell papers we're missing? So far we only have #39 and #79, plus the imaging papers.

@laserson We'll be writing on GitHub so feel free to edit and comment whenever time permits.

@agitter Yes, I would like to help with this review. I am particularly interested in the EHR work, and fleshing out the challenges that @cgreene mentioned in 8c59ae7 (e.g., scaling, hardware limitations, privacy concerns).

I can also contribute to a general discussion of deep learning (e.g., definition/introduction, some state of the art methods).

@evancofer Thanks, that's a good reminder that when we add stubs to the intro section we should allocate space for deep learning background information.

@agitter - I am back after facing several deadlines and am ready to continue contributing again. I am most interested in writing subsections under the "How we study" label. I focused my initial literature review on papers like this (dealing mainly with sequence or expression based deep learning models). I think attacking this subsection with a data-centric point of view would be a good way to start.

I also feel that I have a good sense of existing reviews (see review label) so I could help with setting this review into the broader "deep learning" review context.

I'm ready to start now that github is back! I'll try to get the stubs in #108 italicized today so that we can get that PR merged.

@gwaygenomics Perhaps you and I can work on stubs for the "study" sub-sections over the next few days? If you have ideas, feel free to get started with a pull request. I'm glad that you are familiar with the existing reviews. We should attempt to not restate things that have already been well-covered.

Hi all,

I would like to contribute in write-up and I am specifically interested in writing about "ML methods dealing with multi-omics (or heterogeneous) data". One issue is that I don't see any published DL articles along this line including my recent paper (shallow RNN, #112) so this subsection might be under "shallow learning" if we wish to have. What do you think?

@minseven Welcome. #88 gives some ideas about how we might divide papers into those used to study, categorize, and treat disease. I haven't ready your #112 yet, but it looked very interested upon skimming the figures. This might fit best in the Study section.

Can you please add a summary in #112 so that we can discuss your paper and how it fits in with the others? Some example summaries are #39, #46, #55, and #81.

I'm excited to deal with automatic bibliography construction as discussed in #2 (comment).

Regarding the potential section on LINCS L1000 imputation quality, it looks like we won't be able to proceed with the comparisons we wanted -- #24 (comment). So happy to help out with anything on LINCS L1000 imputation, but there probably won't be much.

Just please tag me when my contributions are needed.

@agitter I'd really like to help out, particularly with privacy implications and phenotyping. I can help with the stubs @casey listed - (Standardization/integration, Pattern Recognition (static + dynamic), and Data sharing and privacy)

One of the things I've been thinking a lot is the way that we have to rephrase biological problems to work well with Deep Learning. The small sample size, wide data problem naturally led to applications in regulatory genetics etc., but works less effectively for problems like genotype-phenotype association. A lot of the biological examples are akin to working with a few images and learning across them as opposed to using 1.2 million images (Imagenet). I think this distinction is important.

Hi,

I can write about Deep Learning to metagenomics.

best,
Gail

On Thu, Oct 20, 2016 at 5:05 PM, Anthony Gitter notifications@github.com
wrote:

Our new goal is to have the review ready to submit by December 1, so we
need to start writing now. There are many people watching this
repository. If any of you have not contributed yet but want to start, there
is still time to join in.

I'd like to estimate how many contributors intend to actively participate
in the writing phase and their areas of interest. Below I've enumerated
everyone who has been involved in the issues so far. Please comment in
this issue to let us know 1) if you intend to help write the manuscript
during the next few weeks and 2) what sub-sections you want to work on.

Others who have joined the discussion on specific issues but did not seem
interested in writing:

  • cangermueller
  • trangptm
  • in4matx
  • yil8
  • ueser


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#116, or mute the thread
https://github.com/notifications/unsubscribe-auth/AE-YuZ2g2-RkyW256ZuWHyNkJAeoqNmtks5q19eCgaJpZM4KcmqO
.

Gail L. Rosen, Associate Professor
Electrical and Computer Engineering
Drexel University
Webpage/Contact info: http://www.ece.drexel.edu/gailr

I would be happy to help with the EHR section.

On Fri, Oct 21, 2016 at 11:47 AM, Gökçen Eraslan notifications@github.com
wrote:

I'd like to contribute as well, but as @traversc
https://github.com/traversc said it would be nice to first discuss how
sections and subsections should look like, or maybe which paper to include
where.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#116 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFllq8pVb7LfASUr6PTAlGzgqXzIwNIks5q2IpPgaJpZM4KcmqO
.

Hi,

Like Uri, I don’t think I will have time to write, but would be happy to read and edit drafts.

Mikael

21 okt. 2016 kl. 03:18 skrev KD notifications@github.com:

Hi,

  1. I'll like to contribute in manuscript write-up.
  2. subsections may include: drugs/cheminformatics, protein structure and studies including binding of one molecule with the other (e.g. protien/lncRNA/DNA/compound etc.)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #116 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAWyxcIabIxArD_IL6p9lds3VGFF0bFIks5q2BLSgaJpZM4KcmqO.

I'd be happy to contribute on evaluation and interpretation.

Anshul

On Oct 20, 2016 2:05 PM, "Anthony Gitter" notifications@github.com wrote:

Our new goal is to have the review ready to submit by December 1, so we
need to start writing now. There are many people watching this
repository. If any of you have not contributed yet but want to start, there
is still time to join in.

I'd like to estimate how many contributors intend to actively participate
in the writing phase and their areas of interest. Below I've enumerated
everyone who has been involved in the issues so far. Please comment in
this issue to let us know 1) if you intend to help write the manuscript
during the next few weeks and 2) what sub-sections you want to work on.

Others who have joined the discussion on specific issues but did not seem
interested in writing:

  • cangermueller
  • trangptm
  • in4matx
  • yil8
  • ueser


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#116, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7ETXMLheWc_r419o8fvBlmN2S0lXpks5q19eBgaJpZM4KcmqO
.

A lot of the biological examples are akin to working with a few images and learning across them as opposed to using 1.2 million images (Imagenet). I think this distinction is important.

@brettbj I would also like to discuss this. I'm planning to include this in the tentatively-named Data limitations subsection of the Discussion. Feel free to start outlining your thoughts there.

I think we're ready to start outlining specific sections. @cgreene @gwaygenomics and I have started this with Categorize and Discussion, and everyone should feel free to make pull requests with proposed outline content. We can then use the pull request comments to organize the discussion of specific topics (unless @cgreene has a more coherent plan).

I approve this plan! Looking forward to it!

On Wed, Oct 26, 2016, 5:29 PM Anthony Gitter notifications@github.com
wrote:

A lot of the biological examples are akin to working with a few images and
learning across them as opposed to using 1.2 million images (Imagenet). I
think this distinction is important.

@brettbj https://github.com/brettbj I would also like to discuss this.
I'm planning to include this in the tentatively-named Data limitations
subsection of the Discussion. Feel free to start outlining your thoughts
there.

I think we're ready to start outlining specific sections. @cgreene
https://github.com/cgreene @gwaygenomics
https://github.com/gwaygenomics and I have started this with Categorize
and Discussion, and everyone should feel free to make pull requests with
proposed outline content. We can then use the pull request comments to
organize the discussion of specific topics (unless @cgreene
https://github.com/cgreene has a more coherent plan).


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#116 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhHs6GokvQfZarBM7h2f1LYnbINbz4zks5q38ZQgaJpZM4KcmqO
.

@agitter I'd like to contribute and get it done in the next few weeks. (I have been following progress on this project, but forgot to follow up with the team earlier. ) Sections I can write on:

  1. Data sharing and privacy;
  2. EHR data, and perhaps phenotyping;
  3. Genetic association studies, and other genetics topics.

I can write about data sharing and privacy, collaborative/distributed deep learning (multi-center consortia). I'm quite familiar with these topics, and also know latest and classical solutions particularly for deep learning around such issues. And it is important to sift through the mixed literature, as many publications in this direction over-claim.

I can also write about sections on EHR data and genetic association studies. There seem not to be many publications on the latter though.

One clarification about "wide-matrix" mentioned by @brettbj and earlier ( #95 ):
This is not a unique challenge to deep learning, but to machine learning in general. This is a classic and hot research area in statistics (dimensionality reduction, feature selection). Regardless, it makes directly applying deep learning methods challenges for our targeted domains.

  • Wei Xie

My situation is similar to @laserson, no spare bandwidth at the moment. I'll put this on my list and if I finish the papers I'm currently working in time I'll pop back in.

@XieConnect - Definitely a fair point, think it's just exacerbated by deep learning

All: I don't plan to have much top-down assignment of authors to sections. Now that we know who is participating, please start to outline the sections you are interested in like @traversc did with #121 and watch for pull requests so that you don't miss discussions of your sections of interest. Several of us are interested in the Study topics and EHR applications so that could be a good place to start.

@XieConnect Thanks for joining. I think what @brettbj is getting at goes beyond learning in the high-dimensional regime. This review isn't the right place it get into VC dimension of neural networks, but we may be able to discuss the number of parameters in neural networks versus other common approaches for high-dimensional data and effective regularization strategies. That also depends on whether we have something to say beyond what other reviews like #47 have covered. The related point is that in some biomedical applications there is a "wide matrix" that is decomposed into separate instances to inflate the number of samples.

Also, what do you mean by collaborative deep learning at multi-center consortia?

@agitter I agree with you on "dimension-VS-samples" issue. It is an important distinction to emphasize regarding Biomedical/Genetic applications (of DNN) VS. standard CS literature.

Regarding my last point on "collaborative multi-center DNN", it is mainly related to "Data sharing and integration" section of the survey. And it is also one solution towards addressing the "large-dimension-small-sample-size" issue earlier, by increasing sample size via federation of datasets owned by many distributed institutions/research groups. Think of it as a distributed version of DNN training (in computer science terms) or summary-statistics method (in genetics), that allows you to jointly train a big model using all "distributed" datasets without actually sharing the underlying raw data (sharing of which has legal or logistic complications).

On section assignment:
I am reading sections proposed so far, and hopefully will contribute soon this week. Meanwhile, I feel using Google Doc (real-time online editing) for section planning/initial draft stage would be way more productive and convenient for everyone. Maybe we can try out?

  • Wei

Regarding my last point on "collaborative multi-center DNN"...

That sounds very interesting and would be a good contribution to the review.

@cgreene may be able to better explain the preference for markdown on GitHub over Google Docs than me. I believe the intention is that we can be fully open and allow anyone who is interested to contribute but still explicitly review and discuss all changes. That review step makes the process slower than editing a Google Doc, but it should help maintain a consistent voice and message throughout the article.

As @agitter says regarding writing via github. Also, it gives the ability to track changes by author so that we can attempt to manage authorship. This would be relatively difficult via google docs.

You can edit inline and create a pull request if you prefer a web-based interface. Thanks!

@agitter @cgreene That makes much sense to write on Github. Thanks for your explanation.

I'll be working on the deep survival analysis and related section this week. I'm in houston for a conference, so i'll focus on it at night.

Dear all,

Late to the party but I'd like to contribute. I'm going over the mass of material already here and having a little trouble seeing what sections are in flight or have been proposed.

Another question: we've got some work going on deep learning for systematic reviews. This is on point in as much as it contributes to evidence-based approaches, but isn't about health per se. Opinions?

Ninja edit: have some knowledge in imaging and deep learning, so could pick that up if no other takers.

Welcome @agapow. The first post of #188 has a fairly up-to-date outline of sections that haven't been drafted and need the most attention. For some, I annotated that no one who has been contributed thus far has been interested enough to write something, but there is still (a little) time to take on one of these topics.

Alternatively, if you view the latest draft you'll see were there are large blocks of italic placeholder text. A couple of those have pending changes in the pull requests.

CONTRIBUTING.md has a few specific suggestions if you do see something you'd like to write. I also recommend sharing in #188 what you intend to work on so that we don't have parallel drafting of the same section.

Could you say more about deep learning for systematic reviews? I'm not sure whether we'd consider that in or out of scope. Our Discussion section does have some more general comments.

Right, I'll start reading up.

I'm not persuaded that the systematic review work is in scope but I'll throw it out there for judgement: There's a growing body of work around using text-mining and ML for doing systematic reviews. Given that SRs are such a massive job, people are looking for ways to select / screen literature using various ML methods, especially based on what literature has been found already (i.e. an adaptive approach). This has had a reasonable amount of success. The longer term view (and far more speculative and thusfar less successful) is for auto-summarization and extraction of findings.

Hello, (very) late to the party, but I should be able to contribute the drug repositioning subsection in Treat. Some more details in #188.

I'm working on study (promoters) and can add some material to the categorise (standards & integration). But: I can't push to the repo. Is this a matter of permissions / access or should I fork and work on that?

@agapow yes, the repo seems to work through a fork and pull request model, which allows pull requests to be peer reviewed. See the CONTRIBUTING file.

@agapow That's right, if you create a branch in your fork (or two if you're working on two different subsections) the you can create a pull request. We'll peer review and (squash and) merge as @enricoferrero said.

I noted above which sections you're working on.

Edit:
I noted the sections in #188, not here. Also, because there has been a lot of work on predicting promoters and enhancers, don't feel compelled to reference everything. Our strategy has been to cite and discuss only the papers needed to back up our claims about the successes, struggles, and opportunities for deep learning in each area.

Writing for the initial release has concluded. Closing this now 😄