Published on March 13, 2014
Recomputation in scientiﬁc experiments Alexander Konovalov and Recomputation.org ! Centre for Interdisciplinary Research in Computational Algebra University of St Andrews ! SPLS 2014 Winter Meeting, Dundee, February 26th, 2014
‣Practices in source code sharing in astrophysics, L.Shamir at al., Astronomy and Computing, vol. 1, Feb. 2013, 54–58 ‣One of the references is almost 24 centuries old ‣This is “On Interpretation” by Aristotle ‣“One of the important advantages of releasing source code is that it allows replication of the results, which is a key concept in science (Aristotle, 350BC).” Quote from classics ...
In practice ... ‣ But even the hired postdoc may not remember all exact details of how the experiment was run ‣ Or do not know about a critical dependency on an obsolete version of some library
Case for change: Recomputation.org It should be as easy to reproduce a computational experiment as to reproduce the chemical reaction from a textbook by mixing baking soda and lemon juice in your kitchen Key principles of recomputation of experiments are declared in The Recomputation Manifesto by Ian Gent (St Andrews)
Recomputation Manifesto by Ian Gent ‣ Computational experiments should be recomputable for all time ‣ Recomputation of recomputable experiments should be very easy ‣ It should be easier to make experiments recomputable than not to ‣ Tools and repositories can help recomputation become standard ‣ The only way to ensure recomputability is to provide virtual machines ‣ Runtime performance is a secondary issue ! See Recomputation.org or arXiv:1304.3674
‣ It’s “recomputation”, not “replication” ‣ Exact replication of a computational experiment ‣ Why this is interesting: ‣ to verify that it works as described ‣ to answer questions that may arise later ‣ Undetected ﬂaws may lead to misleading result lying in the literature for years ‣ Experiments curation: the less experiments are recomputed, the more important is to preserve them by making them recomputable ‣ Storage gets exponentially cheaper ‣ There is a work on system preservation 1. Computational experiments should be recomputable for all time
‣ Simple and quick steps to make it running: ‣ Install free tools: Vagrant and Virtual Box ‣ mkdir anydir ; cd anydir ‣ vagrant init <experiment_id> <URL> ‣ vagrant up ‣ ... and in a few minutes experiment should have run and the results should be in anydir ‣ Let’s check that up to symmetry, there is the only way to place non-attacking 9 queens and a king of each colour (Symmetry in Constraint Programming, Gent, Petrie & Puge, in: Handbook of Constraint Programming (2006) 2. Recomputation of recomputable experiments should be very easy
‣ “It’s not really for the beneﬁt of other people. Experience shows the principal beneﬁciary of reproducible research is you the author yourself” [Jon Claerbout, “Reproducible computational research”] ‣ To be able to rerun it easier for the ﬁnal version of the paper ‣ Or rerun it with new data ‣ Or make new experiment using the old one as a template ‣ and other “what if?” questions 3. It should be easier to make experiments recomputable than not to
‣ Tools are important to ensure that it is easier to make the experiment recomputable from start ‣ There is already a number of tools here, but only a smaller number of repositories ‣ A collection is listed at Recomputation.org (RunMyCode, MyExperiment, and others) ‣ Our vision is in online repositories for experiments, not just for the data derived from experiments ‣ We want to work on establishing such a repository 4. Tools and repositories can help recomputation become standard
5. The only way to ensure recomputability is to provide virtual machines ‣ The code you have today may be built with only minor pain on most of the machines ‣ This is unlikely to be true in 5 years time ... ‣ ... and hardly credible in 20 years ‣ Even now, arbitrary change to the machine being used (e.g. software update) may break the build ‣ VM will preserve the exact conditions of the experiment ‣ Issues of bandwidth, storage, long term persistence... ‣ Tools should cater for that - e.g. does not mean that uploads/downloads are all virtual machines
6. Runtime performance is a secondary issue ‣ Runtime is not the only metric of an experiment ‣ Deterministic metrics: data resulting from the experiment ‣ Data structure(s) constructed using a deterministic algorithm ‣ Calculation of all semigroups of order 8 ‣ Four colour theorem ‣ If one can’t recompute an experiment at all, moreover one can’t preserve the run time performance
Abstract algebra, combinatorics,ﬁnite state automata, formal languages, artiﬁcial intelligence, optimisation and search, parallel computations, physics, chemistry, ... ... ... My interest: Computational algebra system GAP (groups, algorithms and programming), distributed under the GPL, used in 2000+ sites worldwide for research and teaching ‣ GAP pioneered refereeing of user contributed packages ‣ Now we would like to offer the service of recomputing of mathematical experiments to the GAP community and beyond 0 500 1000 1500 2000 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 GAP has been used in almost 2000 scientific publications “We used “We used GAP version X with package version “We used GAP and our own code from URL”
‣ Recomputation.org started by Ian Gent, Lars Kotthoff, John McDermott, (St Andrews) ‣ 6 recomputable experiments for CP2013 already available ‣ Later AK as an SSI Fellow ‣ Partnership with the UK Software Sustainability Institute (SSI) ‣ Windows Azure for Research Award (32 core years, 10 TB storage) ‣ Look for an announcement when infrastructure will be in place ‣ Looking for supporters ‣ Support us by making your experiments recomputable Mission statement: If we can recompute it now, anyone can recompute it 20 years from now
The Insight Centre for Data Analytics is a joint initiative between researchers at Dublin City University, NUI Galway, University College Cork, University ...
recomputation.org will make available virtual machines or equivalent technology to allow exact recomputation of lodged experiments. ... for scientific ...
Want to watch this again later? Sign in to add this video to a playlist. Tutorial by Ian Gent (University of St Andrews) and Lars Kotthoff ...
The Recomputation Manifesto. ... Replication of scientific experiments is critical to the ... Recomputation of recomputable experiments should be ...
Replication and Recomputation in Scientific Experiments. Speakers: Ian Gent, University of St. Andrews, UK, and Lars Kotthoff, University College Cork, Ireland
... says that replication of scientific experiments is ... the The Recomputation Manifesto, ... Recomputation of recomputable experiments should be ...
Most recent talks are listed below: ... Recomputation in Scientific experiments. Lightning talk at Collaborations workshop 2014, Oxford, 26-28 March 2014.
TUTORIALS. The ECAI 2014 ... T11: Replication and Recomputation in Scientific Experiments Ian Gent and Lars Kotthoff; Tuesday, August 19, 14:00–17:30.
Scottish Programming Languages Seminar 2014 Winter Meeting. ... The title of my talk was "Recomputation in Scientific Experiments" (slides and tweet).
Scottish Programming Languages Seminar 2014 Winter Meeting. Welcome to the site of the SLPS 2014 winter meeting ... — Recomputation in Scientific ...