MozillaFoundation / mozfest-program-2017

Mozilla Festival proposals for 2017

Home Page:https://mozillafestival.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Science needs Open Source code - culture shift innovation workshop

mozfest-bot opened this issue · comments

[ UUID ] 1689daa8-8769-4f66-af03-6d3ec2a4a3ea

[ Submitter's Name ] Yo Yehudi
[ Submitter's Affiliated Organisation ] InterMine (University of Cambridge)
[ Submitter's Github ] @yochannah

What will happen in your session?

Much of modern science is produced with the aid of computer software and dedicated code analyses, but despite this, there are many examples of closed-source scientific computer programs. This session would aim to identify possible root causes as well as ways to remedy this / encourage open source and peer reviewed code.
A short set of slides would provide an intro to the issue and to the academic publishing peer review model. This would be followed by a group workshop whiteboard / sticky note activity & wrap-up discussion.

Discussion Topics:

  • Why is code so often seen as okay to be proprietary, even in science where peer review is vital?
  • Who do we need to influence?
  • How can we change things?

What is the goal or outcome of your session?

To identify root causes of closed-source software in science, along with ways to change culture in the future, and perhaps even ways to measure change along the way. I hope that discussion with open-source enthusiasts would be a good way to facilitate this.
A second but no less important goal would be to gather names of like-minded people with whom to collaborate to help push this forwards.
Finally, non-academic attendees would hopefully be exposed to academic behaviours such as paper publishing, peer review, and the importance of code in modern science. Not all science involves lab coats and test tubes!

If your session requires additional materials or electronic equipment, please outline your needs.

A1 poster-size paper sheets on a flipboard or a whiteboard for writing discussed items;

Enough sticky notes and pens for everyone to be able to write brainstorm ideas down.

A projector for the (short) intro slides and perhaps to bring up websites that might be discussed during the workshop if relevant.

Time needed

60 mins

That sounds like a cool session proposal! 🎉

@yochannah - Great session idea.

Neil, Casey, Pjotr - I've tagged you here as I would very much welcome any input or feedback - if you've no time, please ignore! This is in relation to a session proposal for MozFest, London, late October. We are interested to discuss how to review code for research, taking into account researchers may not be software developers, and we want code to be published in a way that's not a blocker, but a means to encourage improvement through transparency. This may be complementary to @yochannah's proposed session or part of it.

Comments:

I am interested to expand on your session idea Yo - specifically how do we review code for research. Do you see this fitting within your session, or shall I propose a complementary one? I think perhaps the latter, to follow on from your discussion?

Our thoughts:
Given researchers are not software developers, how do we help to judge quality (so that journals can ensure some level of code quality) but also have a process that welcomes less-than-perfect code and offers opportunity for community to work together to improve quality.

Is it necessary to involve developers from other domains or are their code quality experts within academia who will be able to do this? (see note below)

We're looking for input from developers, researchers, and others on best mechanism to peer review code or to provide feedback, crossing from software expertise into academic peer review.

--
Note on who could/should review: consider Software sustainability institute @npch and key researchers who are identified as leaders in code quality e.g. Casey and team @greenelab. How do JOSS review? @pjotrp

Hey All!

Interesting session proposal.

You might be interested (if you haven't already come across it) in work published by Victoria Stodden and James Howison on (respectively) barriers and incentives to publishing scientific code.

From a personal perspective, the common reasons that I have heard are:

  • My code isn't good enough yet (and also see my writeup on the snobbishness in the area)
  • I have plans to commercialise it
  • I don't have the time to support it
  • If I release my code, other people will publish in my area before me, because they have better resourced groups

Wearing my JORS editor hat, I'll say that I believe peer review of code is both very easy and very difficult. It's easy because there's a few things that are easy to implement and check that makes code reusable at a basic level: license, example of use, sample data files. It's difficult, because in many areas, to properly check the code quality, you need to be both a domain expert and a software expert.

Increasingly in the UK (and elsewhere) there's the idea of a research software engineer - who is the sort of person who can help a researcher check the quality of their code and perhaps help them improve it.

This is all very much what we are pushing to happen for at the Software Sustainability Institute, and we've run two previous MozFest sessions on related aspects:

  1. What makes good code good (for science)?
  2. How successful is my (open use) scientific software?

Thanks so much Neil @npch - these are all really helpful resources, and a great background for me to dig into. I'll digest and see if we can come up with a proposition that builds on top of these discussions, and is relevant to the input we are trying to collect from the MozFest attendees. Specifically I think our angle is: if we provided a platform for researchers to get advice from software developers, what would/could/should happen?

@yochannah - may I ask for your feedback on an outline of our session proposal once I've developed it? (That'll be Wednesday now probably, I'm off tomorrow.) I'd love to hear if it's something you'd want to work together on (using your session proposal as starting block), or if you'd like it to sit as a complement to your discussion (which I gather to be: why are people staying closed source?).

JOSS review is pretty light. We have a check list, e.g. openjournals/joss-reviews#320. Main thing is that the software has the right license, installs, runs and makes sense. Tests are optional - though we encourage best practices. We don't necessarily check the source code.

My personal view is that peer/community pressure works best. Long living projects tend to focus on quality at some point out of necessity. If a project is a one-off it does not make sense to engineer too much. I don't trust code I can't understand, so I tend to pass on projects that look bad. Good code should read like a story, in a nutshell.

@npch - Thanks again for those links. I think they'd provide a nice foundation for the proposal - we know what makes code good and what makes it successful - now how do we motivate people to make it open? Some of that may loop back into making people feel that their code is good (good enough to be released, that is), but I suspect there's more to it than that.

@npscience I would love to offer feedback on your outline if there's still time?

I agree with @pjotrp's point that peer pressure can help create better code quality. It raises a few other questions on my head though:

  • Is part of the reason some scientific code is kept closed-source because of peer pressure, too? That they know or fear it's poor quality, and don't want others to see? The equivalent of having a tidy living room, with loads of mess hiding in the cupboard - it's still there, but we aren't letting the world see it.
  • Should it be ok to publish a paper about software where the author's too ashamed of their code to share it?

I don't trust code I can't understand, so I tend to pass on projects that look bad. Good code should read like a story, in a nutshell.

These are the same metrics that speak of maintainable code to me. Is well maintained and maintainable software more likely to be correct?

This whole conversation thread is so useful, regardless of what happens at Mozfest - thanks for your contributions.

@yochannah I've just submitted my proposal [issue number to follow], and absolutely open to suggestions as Mozilla refine these pitches. I'll link you in the comments section.

A couple of other examples of scientific tools that aren't open - in this case it may simply result in them being used less, or not be used at all: (tweet thread) https://twitter.com/kaiblin/status/900321669482504194 - but it is hard to measure the use of proprietary software so hard to be sure

That's a different kind of open though I would argue. Because in those cases the source code often is available for review, it's just not licensed under an open software license.

Which comes with the problems that Kai mentions, similar to the famous cases of kallisto and GATK, but I'm not sure it would qualify as not publishing the code?

Hmm, good point; while the two are related, peer reviewed code doesn't necessarily result in open source scientific tooling.

I feel like we need a set of explicit words (or phrases) for all the different but related open source science concepts.

In that Twitter thread @kblin mentioned two "free for academic use only" tools (EFICAz and MEME suite) and I mentioned SignalP, TMHMM, etc from http://www.cbs.dtu.dk/services/software.php - where I was unable to even get permission to package for automated installation.

I would argue that while "open for review" code is better than completely opaque from a scientific perspective, in order to be viable long-term, software needs to be under a FLOSS license. That way at least it has the chance to survive even if the initial creators abandon the project.

@kblin totally agree on that. I think it's important to frame it in this sense and less in otherwise we can't review it, because then people will happily use their "free for academic use licenses" and it still sucks 😉

A couple years back @ctb wrote a blog post with a perspective on the balance between getting a new method out vs. getting a new tool out (http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html). Back then I wrote a more elaborate version of my previous comment (http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html#comment-1977208787). It also explains why, as a reviewer, I'm extremely wary of reviewing code for a piece of software I might be interested in if it's under a "look, don't touch" license.

[ Facilitator 1 Name ] Yo Yehudi
[ Facilitator 1 Github ] @yochannah
[ Facilitator 1 Twitter ] @yoyehudi
[ Facilitator 2 Name ] Naomi Penfold
[ Facilitator 2 Github ] @npscience
[ Facilitator 2 Twitter ] @eLifeInnovation
[ Facilitator 3 Name ] Bastian Greshake Tzovaras
[ Facilitator 3 Github ] @gedankenstuecke
[ Facilitator 3 Twitter ] @gedankenstuecke

Requirements

  • Projector and screen
  • Whiteboard or A1 flipchart and markers
  • Post-it notes + pens