Science needs Open Source code - culture shift innovation workshop

Question

Science needs Open Source code - culture shift innovation workshop

mozfest-bot opened this issue 7 years ago · comments

[ UUID ] 1689daa8-8769-4f66-af03-6d3ec2a4a3ea

[ Submitter's Name ] Yo Yehudi
[ Submitter's Affiliated Organisation ] InterMine (University of Cambridge)
[ Submitter's Github ] @yochannah

What will happen in your session?

Much of modern science is produced with the aid of computer software and dedicated code analyses, but despite this, there are many examples of closed-source scientific computer programs. This session would aim to identify possible root causes as well as ways to remedy this / encourage open source and peer reviewed code.
A short set of slides would provide an intro to the issue and to the academic publishing peer review model. This would be followed by a group workshop whiteboard / sticky note activity & wrap-up discussion.

Discussion Topics:

Why is code so often seen as okay to be proprietary, even in science where peer review is vital?
Who do we need to influence?
How can we change things?

What is the goal or outcome of your session?

To identify root causes of closed-source software in science, along with ways to change culture in the future, and perhaps even ways to measure change along the way. I hope that discussion with open-source enthusiasts would be a good way to facilitate this.
A second but no less important goal would be to gather names of like-minded people with whom to collaborate to help push this forwards.
Finally, non-academic attendees would hopefully be exposed to academic behaviours such as paper publishing, peer review, and the importance of code in modern science. Not all science involves lab coats and test tubes!

If your session requires additional materials or electronic equipment, please outline your needs.

A1 poster-size paper sheets on a flipboard or a whiteboard for writing discussed items;

Enough sticky notes and pens for everyone to be able to write brainstorm ideas down.

A projector for the (short) intro slides and perhaps to bring up websites that might be discussed during the workshop if relevant.

Time needed

60 mins

Bastian Greshake Tzovaras · Answer 1 · Sun Jul 23 2017 23:30:41 GMT+0800 (China Standard Time)

That sounds like a cool session proposal! 🎉

Naomi Penfold · Answer 2 · Mon Jul 24 2017 19:01:11 GMT+0800 (China Standard Time)

@yochannah - Great session idea.

Neil, Casey, Pjotr - I've tagged you here as I would very much welcome any input or feedback - if you've no time, please ignore! This is in relation to a session proposal for MozFest, London, late October. We are interested to discuss how to review code for research, taking into account researchers may not be software developers, and we want code to be published in a way that's not a blocker, but a means to encourage improvement through transparency. This may be complementary to @yochannah's proposed session or part of it.

Comments:

I am interested to expand on your session idea Yo - specifically how do we review code for research. Do you see this fitting within your session, or shall I propose a complementary one? I think perhaps the latter, to follow on from your discussion?

Our thoughts:
Given researchers are not software developers, how do we help to judge quality (so that journals can ensure some level of code quality) but also have a process that welcomes less-than-perfect code and offers opportunity for community to work together to improve quality.

Is it necessary to involve developers from other domains or are their code quality experts within academia who will be able to do this? (see note below)

We're looking for input from developers, researchers, and others on best mechanism to peer review code or to provide feedback, crossing from software expertise into academic peer review.

--
Note on who could/should review: consider Software sustainability institute @npch and key researchers who are identified as leaders in code quality e.g. Casey and team @greenelab. How do JOSS review? @pjotrp

Neil Chue Hong · Answer 3 · Mon Jul 24 2017 19:19:34 GMT+0800 (China Standard Time)

Hey All!

Interesting session proposal.

You might be interested (if you haven't already come across it) in work published by Victoria Stodden and James Howison on (respectively) barriers and incentives to publishing scientific code.

From a personal perspective, the common reasons that I have heard are:

My code isn't good enough yet (and also see my writeup on the snobbishness in the area)
I have plans to commercialise it
I don't have the time to support it
If I release my code, other people will publish in my area before me, because they have better resourced groups

Wearing my JORS editor hat, I'll say that I believe peer review of code is both very easy and very difficult. It's easy because there's a few things that are easy to implement and check that makes code reusable at a basic level: license, example of use, sample data files. It's difficult, because in many areas, to properly check the code quality, you need to be both a domain expert and a software expert.

Increasingly in the UK (and elsewhere) there's the idea of a research software engineer - who is the sort of person who can help a researcher check the quality of their code and perhaps help them improve it.

This is all very much what we are pushing to happen for at the Software Sustainability Institute, and we've run two previous MozFest sessions on related aspects:

Naomi Penfold · Answer 4 · Mon Jul 24 2017 23:50:13 GMT+0800 (China Standard Time)

Thanks so much Neil @npch - these are all really helpful resources, and a great background for me to dig into. I'll digest and see if we can come up with a proposition that builds on top of these discussions, and is relevant to the input we are trying to collect from the MozFest attendees. Specifically I think our angle is: if we provided a platform for researchers to get advice from software developers, what would/could/should happen?

@yochannah - may I ask for your feedback on an outline of our session proposal once I've developed it? (That'll be Wednesday now probably, I'm off tomorrow.) I'd love to hear if it's something you'd want to work together on (using your session proposal as starting block), or if you'd like it to sit as a complement to your discussion (which I gather to be: why are people staying closed source?).

Yo Yehudi · Answer 5 · Tue Jul 25 2017 01:06:51 GMT+0800 (China Standard Time)

Responding quickly while waiting to board at the airport: I suspect that our sessions are complementary but perhaps not the same, which I gather is probably the way you were inclined to think, too. Back-to-back sessions might work nicely, or if one of our proposals gets accepted and the other doesn't, I'd also be happy to share/co-deliver it, if that works for you? @npch - I'll need to work through the links you provided when I am back in the UK. Thanks for your comments, they are very useful! I'll start collating a list of reasons why, soon.

Pjotr Prins · Answer 6 · Wed Jul 26 2017 15:42:26 GMT+0800 (China Standard Time)

JOSS review is pretty light. We have a check list, e.g. openjournals/joss-reviews#320. Main thing is that the software has the right license, installs, runs and makes sense. Tests are optional - though we encourage best practices. We don't necessarily check the source code.

My personal view is that peer/community pressure works best. Long living projects tend to focus on quality at some point out of necessity. If a project is a one-off it does not make sense to engineer too much. I don't trust code I can't understand, so I tend to pass on projects that look bad. Good code should read like a story, in a nutshell.

Yo Yehudi · Answer 7 · Fri Jul 28 2017 22:09:19 GMT+0800 (China Standard Time)

@npch - Thanks again for those links. I think they'd provide a nice foundation for the proposal - we know what makes code good and what makes it successful - now how do we motivate people to make it open? Some of that may loop back into making people feel that their code is good (good enough to be released, that is), but I suspect there's more to it than that.

@npscience I would love to offer feedback on your outline if there's still time?

I agree with @pjotrp's point that peer pressure can help create better code quality. It raises a few other questions on my head though:

Is part of the reason some scientific code is kept closed-source because of peer pressure, too? That they know or fear it's poor quality, and don't want others to see? The equivalent of having a tidy living room, with loads of mess hiding in the cupboard - it's still there, but we aren't letting the world see it.
Should it be ok to publish a paper about software where the author's too ashamed of their code to share it?

I don't trust code I can't understand, so I tend to pass on projects that look bad. Good code should read like a story, in a nutshell.

These are the same metrics that speak of maintainable code to me. Is well maintained and maintainable software more likely to be correct?

Pjotr Prins · Answer 8 · Sat Jul 29 2017 00:27:27 GMT+0800 (China Standard Time)

On Fri, Jul 28, 2017 at 02:09:21PM +0000, Yo Yehudi wrote: I agree with ***@***.***'s point that peer pressure can help create better code quality. It raises a few other questions on my head though: * Is part of the reason some scientific code is kept closed-source because of peer pressure, too? That they know or fear it's poor quality, and don't want others to see? The equivalent of having a tidy living room, with loads of mess hiding in the cupboard - it's still there, but we aren't letting the world see it.

There is some of that. But I think it is mostly laziness - i.e, people don't see the point. Usually when I get people to put stuff online they are quite happy - but it just is not the way they think.

* Should it be ok to publish a paper about software where the author's too ashamed of their code to share it?

No. It means the results are not transparent, nor reproducible. I care about that and I am working on that as part of the GNU Guix project, CWL and OpenData.

I don't trust code I can't understand, so I tend to pass on projects that look bad. Good code should read like a story, in a nutshell. These are the same metrics that speak of maintainable code to me. Is well maintained and maintainable software more likely to be correct?

Yes. It means the programmer understands what he is doing. Not a guarantee, but chances are that it is much better. Pj.

Neil Chue Hong · Answer 9 · Mon Jul 31 2017 19:43:48 GMT+0800 (China Standard Time)

> I don't trust code I can't understand, so I tend to pass on projects > that look bad. Good code should read like a story, in a nutshell. > > These are the same metrics that speak of maintainable code to me. Is > well maintained and maintainable software more likely to be correct? Yes. It means the programmer understands what he is doing. Not a guarantee, but chances are that it is much better.

The other obvious, but important, thing is that if the code is incorrect, it is much easier to identify and fix code which is well maintained / maintainable. This happens more often than might be thought in software in some fields of research, particularly when new "wet-lab" experimental data becomes available to test against. This can often reveal bugs in areas of the code which haven't been well stressed previously or indeed just highlight a requirement to change the underlying science as represented in the code.

Naomi Penfold · Answer 10 · Tue Aug 01 2017 00:01:04 GMT+0800 (China Standard Time)

This whole conversation thread is so useful, regardless of what happens at Mozfest - thanks for your contributions.

@yochannah I've just submitted my proposal [issue number to follow], and absolutely open to suggestions as Mozilla refine these pitches. I'll link you in the comments section.

Yo Yehudi · Answer 11 · Wed Aug 23 2017 19:48:54 GMT+0800 (China Standard Time)

A couple of other examples of scientific tools that aren't open - in this case it may simply result in them being used less, or not be used at all: (tweet thread) https://twitter.com/kaiblin/status/900321669482504194 - but it is hard to measure the use of proprietary software so hard to be sure

Bastian Greshake Tzovaras · Answer 12 · Wed Aug 23 2017 20:06:02 GMT+0800 (China Standard Time)

That's a different kind of open though I would argue. Because in those cases the source code often is available for review, it's just not licensed under an open software license.

Which comes with the problems that Kai mentions, similar to the famous cases of kallisto and GATK, but I'm not sure it would qualify as not publishing the code?

Yo Yehudi · Answer 13 · Wed Aug 23 2017 20:10:51 GMT+0800 (China Standard Time)

Hmm, good point; while the two are related, peer reviewed code doesn't necessarily result in open source scientific tooling.

I feel like we need a set of explicit words (or phrases) for all the different but related open source science concepts.

Peter Cock · Answer 14 · Wed Aug 23 2017 21:25:54 GMT+0800 (China Standard Time)

In that Twitter thread @kblin mentioned two "free for academic use only" tools (EFICAz and MEME suite) and I mentioned SignalP, TMHMM, etc from http://www.cbs.dtu.dk/services/software.php - where I was unable to even get permission to package for automated installation.

Kai Blin · Answer 15 · Thu Aug 24 2017 15:22:48 GMT+0800 (China Standard Time)

I would argue that while "open for review" code is better than completely opaque from a scientific perspective, in order to be viable long-term, software needs to be under a FLOSS license. That way at least it has the chance to survive even if the initial creators abandon the project.

Bastian Greshake Tzovaras · Answer 16 · Thu Aug 24 2017 15:24:44 GMT+0800 (China Standard Time)

@kblin totally agree on that. I think it's important to frame it in this sense and less in otherwise we can't review it, because then people will happily use their "free for academic use licenses" and it still sucks 😉

Kai Blin · Answer 17 · Thu Aug 24 2017 15:36:59 GMT+0800 (China Standard Time)

A couple years back @ctb wrote a blog post with a perspective on the balance between getting a new method out vs. getting a new tool out (http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html). Back then I wrote a more elaborate version of my previous comment (http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html#comment-1977208787). It also explains why, as a reviewer, I'm extremely wary of reviewing code for a piece of software I might be interested in if it's under a "look, don't touch" license.

Kirstie Whitaker · Answer 18 · Fri Sep 22 2017 22:36:26 GMT+0800 (China Standard Time)

[ Facilitator 1 Name ] Yo Yehudi
[ Facilitator 1 Github ] @yochannah
[ Facilitator 1 Twitter ] @yoyehudi
[ Facilitator 2 Name ] Naomi Penfold
[ Facilitator 2 Github ] @npscience
[ Facilitator 2 Twitter ] @eLifeInnovation
[ Facilitator 3 Name ] Bastian Greshake Tzovaras
[ Facilitator 3 Github ] @gedankenstuecke
[ Facilitator 3 Twitter ] @gedankenstuecke

Requirements

Projector and screen
Whiteboard or A1 flipchart and markers
Post-it notes + pens