Open source is a hard requirement for reproducibility
Open source is a hard requirement for reproducibility.
No ifs nor buts. And I’m not only talking about the code you typed for your research paper/report/analysis. I’m talking about the whole ecosystem that you used to type your code.
(I won’t be talking about making the data available, because I think this is another blog post on its own.)
Is your code open? That’s good. But is it code for a proprietary program, like STATA, SAS or MATLAB? Then your project is not reproducible. It doesn’t matter if this code is well documented and written and available on Github. This project is not reproducible.
Why?
Because there is on way to re-execute your code with the exact same version of this proprietary program down the line. As I’m writing these lines, MATLAB, for example, is at version R2022b. And it is very unlikely that you can buy version, say, R2008a. Maybe you can. Maybe MATLAB offers this option. But maybe they don’t. And maybe if they do today, they won’t in the future. There’s no guarantee. And if you’re running old code written for version R2008a, there’s no guarantee that it will produce the exact same results on version 2022b. And let’s not even mention the toolboxes (if you’re not familiar with MATLAB’s toolboxes, they’re the equivalent of packages or libraries in other programming languages). These evolve as well, and there’s no guarantee that you can purchase older versions of said toolboxes. And also, is a project truly reproducible (even if old programs can be purchased) if it’s behind a paywall?