By Kirsty, on 17 November 2020
The Research Software Development Group hosted the first ReproHack at UCL as part of the Open Access Week events run this year by the Office for Open Science and Scholarship. This was not only the first event of this type at UCL, but the first time a reprohack ran for a full week.
What’s a Reprohack?
A ReproHack is a hands-on reproducibility hackathon where participants attempt to reproduce the results of a research paper from published code and data and share their experiences with the group and the papers authors.
As it normally happens on hackathons, this is also a learning experience! During a Reprohack the participants, besides contributing to measure the reproducibility of published papers, also learn how to implement better reproducibility practices into their research and appreciate the high value of sharing code for Open Science.
An important aspect of the Reprohacks is that the authors themselves are the ones who put forward their papers to be tested. If you’ve published a paper and provided code and data with it, you can submit your papers for future editions of Reprohack! Your paper may be chosen by future reprohackers and provide you with feedback about the reproducibility of your paper! The feedback form is well designed so you get a complete overview of what went well and what could be improved.
Reprohacks are open to all domains! Any programming language used in the papers are accepted, however papers with code using open source programming languages are more likely to be to chosen by the participants as it may be easier to install it on their computers.
In this particular edition, the UCL Research Software Development Group was available throughout the week to provide support to the reprohackers.
What did I miss?
This was the first Reprohack at UCL! You missed all the excitement that first-time events bring with them! But do not worry, there will be more Reprohacks!
This event was particularly challenging with the same difficulties we have been fighting for the last nine months trying to run events online, but we had gain some experience already with other workshops and training sessions we run so everything went smoothly!
The event started with a brief introduction of what the event was going to be like, an ice-breaker to get the participants talking and a wonderful keynote by Daniela Ballari entitled “Why computational reproducibility is important?”. Daniela provided a great introduction to the event (did you know that only a ~20% of the published literature has only the “potential” of being computationally reproduced? and that most of it can’t be because either the software is not free, the data provided is incomplete, or it misses which version of the software was used? [Culina, 2020]), linking to resources like The Turing Way and providing five selfish reasons to work on reproducibility. She put these reasons in context of our circles of influences like how these practices benefits the author, their team, the reviewers and the overall community. The questions and answers that followed the talk were also very insightful! Daniela is a researcher in Geoinformation and Geostatics and never trained as a software developer, so she had to learn her way to make her research reproducible and her efforts in that front were highlighted in the selfish reasons she proposed in her talk.
The rest of the event consisted on ReproHacking-hacking-hacking! We separated into groups and started to choose papers. We then disconnected from the call and each participant or team worked as they preferred over the next days to try to reproduce the paper(s) they chose. At the end of the week we reconvened together to share how far we’d got and what we learned on the way.
In total we reviewed four papers, only one participant managed to reproduce the whole paper, the rest (me included) were stuck on the process. We found that full reproducibility is not easy! If the version of a software is not mentioned, then it becomes very difficult to find why something is not working as it should. But we also had a lot of fun and the participants were happy that there is a community at UCL that fights for reproducibility!
This ReproHack also counted with Peter Schmidt interviewing various participants for Code for thought, a podcast that will be published soon! Right now he’s the person running RSE Stories on this side of the Atlantic, a podcast hosted by Vanessa Sochat.
We will run this again! When? Not sure. We would like to run it twice a year, maybe again during the Open Access week and another session sometime between March-April. Are you interested in helping to organise it? Give me a shout! We can make a ReproHack that fits better for our needs (and our researchers!)
Million thanks to Daniela Ballari, her talk was very illustrative and helpful to set the goals of the event!
Million thanks to Anna Krystalli too, a fellow Research Software Engineer at the University of Sheffield as she was the creator of this event and provided a lot of help to get us ready! She’s a Software Sustainability Institute Fellow and the SSI gave the initial push for this to exist. We also want to thank the RSE group at Sheffield as we were using some of their resources to run the event!
I also want to thank the organisers of ReproHack in LatinR (thanks Florencia!) as their event was just weeks before ours and seeing how they organised was super helpful!