Software Freedom and Scholarship: Reproducible Research
Published 2019-10-30 on Anjan's Homepage
Reproducibility is a critical feature of science. The chaotic features of nature contribute to artifacts in our measurements, but these errors can be mitigated by taking more measurements 1. The subsequent trials can reveal the random variation, and hint at the true quantity being sought.
Reproducibility is no guarantee of correctness. It will never indicate if the correct thing is being measured, or if it's even important. It does, however allow for more reliable results in science.
1 Claerbout's Principal
An article about computational result is advertising, not scholarship. Actual scholarship is the full software environment, code and data, that produced the result. - Claerbout and Karrenbach, Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 1992
The issue of computational result being present as scholarship is common today. When a study reviewed 613 papers in eight computer science conferences and five ACM journals, they found they could only reproduce 24.9% 2. As an anecdote: I have seen plenty of articles where the code is not available or the code is proprietary. The lack of reproducibility in science is related to a number of other issues but in this post, I would like to detail one: studies often present results that were generated using proprietary software. As a result of scholars using software that does not guarantee the four essential freedoms, distribution and investigation is limited. Consequently, reproducibility is hindered. Furthermore, scholars often neglect publishing the in-house code used to generate the results. As a result, readers cannot run and inspect the software on their own computers.
2 Reproducible Research
Reproducible research is academic research where the final result contains the publication of everything that was used in the project: including notebooks and full computational environment. This computational environment includes: the code, data, etc used to produce the academic work 3 , 4. In other words, reproducible research implores scholars to use and write software that guarantees its users the following essential freedoms 4 , 5:
- The freedom to run the program as you wish, for any purpose (freedom 0).
- The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
- The freedom to redistribute copies so you can help your neighbour (freedom 2).
- The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
The prevalence of proprietary software in academia is a threat to the legitimacy of academia. The programs used and their source code should be expected in the Materials and Methods section of an academic article alongside the four essential freedoms.
3 Case Study
In the 2010 issue of Economic Review, economists Carmen Reinhart and Kenneth Rogoff (RR) published a paper whose conclusions favoured austerity. Specifically, it was concluded that a given country's GDP declines by two percent when "gross external debt reaches 60 percent of GDP" and when external debt is greater than 90%, GDP growth is halved 6. The paper was published in the aftermath of the 2008 economic crash and was widely cited by Paul Ryan's Republican Party Budget and British MP George Osborne 7 , 8.
Whatever your political leanings may be, the paper has multiple methodological flaws 9. Hermon, Ash and Pollin (HAP) requested the code and data. HAP published a rebuttal which showed a coding error that resulted in the results ignoring data from several countries 7. RR conceded that this coding error had been made 10.
Beyond quality, this case study portrays the threat that irreproducible research has towards society. If HAP did not have the freedom to check RR's research, we might still be operating under the conclusions of the paper.
Free software is about having control over the technology we use in our homes, schools and businesses, where computers work for our individual and communal benefit.11
Free software is necessary to create a free society in which we are able to criticize the decisions which are generated by computers.
4 Software Installed in Schools
On a larger scale, the issue of proprietary software restrictions appears in many ordinary computer labs. Students are at the mercy of their institutions to use whatever software is installed. It may be the industry standard, but the benefits associated with being the industry standard end when students want to learn and critique the decisions the software makes in their lives. Restrictions are rampant with proprietary software, and non free software often resists reverse engineering to prevent discovering how calculations are done. Indeed, the license restrictions threatens the user with monetary and/or legal repercussions for investigating the system. Common examples are Matlab, Solidworks, Eagle, ANSYS. In this way, proprietary software is a threat to the social mission of school itself: "to teach students to be citizens of a strong, capable, independent, cooperating and free society" 12.
Free software has an exclusive feature that it guarantees to students: the freedom to learn 13. Scholars are not to be blamed for prevalence of proprietary software in schools. Proprietary software is entrenched in society today and we cannot control the circumstances we grow up in. Going forward, it is up to educators to recognize the threat proprietary software poses to society, and promote software that does social good. Indeed, educators can lead in making software freedom the industry standard.
5 What You Can Do
In an ideal world, all the software used in a study (including its dependencies ie. the operating system and programming language) would be released as free software. Freedom in software at this point will come gradually. For example, if you must use Matlab, please consider publishing your scripts under a free license 13 , 14.
If you are making a study, please commit yourself to using as much free software that is available and you have time to transition to. I hope this article serves as an introduction to why software freedom matters in academia. Precisely, the future of academia, its legitimacy, and its ability to criticize the system we live under is at risk.
This post was inspired by this video. Special thanks to Evan Misshula for describing the issue of reproducible science and its relation to software freedom. Please note: I love emacs but reproducible research is not as difficult as learning emacs and using it for your research. In the first half Evan Misshula's talk, he recommends other free software projects with lower barriers to entry that are also committed to helping scholars create reproducible research. For notebooks that encourage radical reproducibility: jupyter Rmarkdown
Of course, I am available via email: email@example.com for free software recommendations =).
6 Final Notes
I use proprietary software for my academic work. I use ansys, matlab, solidworks, etc. Proprietary software is entrenched in society and the free software movement has a lot of work to do. This is an article to make other scholars aware of software freedom and what it can do for academia.
guest editor: Colin Leitner http://www3.telus.net/colinl333
JCGM 100:2008. Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF), Joint Committee for Guides in Metrology, 2008 Available Online: http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf
Christian Collberg, Todd Proebsting, Gina Morail, Akash Shankaran, Zuoming Shi, and Alex M Warren, “Measuring Reproducibility in Computer Systems Research,” University of Arizona, Mar. 2014. Available Online: http://reproducibility.cs.arizona.edu/v1/tr.pdf
"Reproducible Research," in Computing in Science & Engineering, vol. 12, no. 5, pp. 8-13, Sept.-Oct. 2010. doi: 10.1109/MCSE.2010.113 Available Online: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5562471&isnumber=5562464
Buckheit, Jonathan B.; Donoho, David L. (May 1995). WaveLab and Reproducible Research (PDF) (Report). California, United States: Stanford University, Department of Statistics. Technical Report No. 474. Retrieved 2019-10-29. available: https://statistics.stanford.edu/sites/default/files/EFS%20NSF%20474.pdf
GNU Operating System, “What is free software?,” GNU Operating System. [Online]. Available: https://www.gnu.org/philosophy/free-sw.html.en. [Accessed: 29-Oct-2019].
C. M. Reinhart and K. S. Rogoff, “Growth in a Time of Debt,” American Economic Review, vol. 100, no. 2, pp. 573–578, May 2010. Available Online: https://dash.harvard.edu/bitstream/handle/1/11129154/Reinhart_Rogoff_Growth_in_a_Time_of_Debt_2010.pdf?sequence=1
Thomas Herndon, Michael Ash, and Robert Pollin, “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,” no. 322, Apr. 2013. Available Online: https://web.archive.org/web/20130418125357if_/http://www.peri.umass.edu/fileadmin/pdf/working_papers/working_papers_301-350/WP322.pdf
James Lyons, “George Osborne’s favourite ‘godfathers of austerity’ economists admit to making error in research,” Mirror, 17-Apr-2013. Available Online: https://www.mirror.co.uk/news/uk-news/george-osbornes-favourite-economists-reinhart-1838219
Growth in a Time of Debt (n.d.). In Wikipedia. Retrieved 2019-10-30, from: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Methodological_flaws
Carmen Reinhart and Kenneth Rogoff, “Full Response From Reinhart and Rogoff,” 17-Apr-2013. Available Online: https://archive.nytimes.com/www.nytimes.com/interactive/2013/04/17/business/17economix-response.html
Big Brother Watch, “Free Software,” Big Brother Watch UK. [Online]. Available: https://bigbrotherwatch.org.uk/about/free-software/. [Accessed: 29-Oct-2019].
GNU Operating System, “Why Educational Institutions Should Use and Teach Free Software,” GNU Operating System. [Online]. Available: https://www.gnu.org/education/edu-why.html. [Accessed: 29-Oct-2019].
The four freedoms are gauranteed by software licenses ie. GPL, MIT, BSD licenses. GNU Operating System, “Various Licenses and Comments about Them,” GNU Operating System. [Online]. Available: https://www.gnu.org/licenses/license-list.html. [Accessed: 29-Oct-2019].
V. Stodden et al., “Enhancing reproducibility for computational methods,” Science, vol. 354, no. 6317, pp. 1240–1241, Dec. 2016. Available Online: https://science.sciencemag.org/content/354/6317/1240/tab-pdf
Articles from blogs I follow around the netThese articles/blogs do not represent my own opinions or views.
The standard introduction to git starts with an explanation of what it means to use a “distributed” version control system. It’s pointed out that every developer has a complete local copy of the repository and can work independently and offline, often contra…via Blogs on Drew DeVault's blog September 2, 2020
I’ve used and defended Eshell for years. Sadly, Eshell has some long standing issues that I grew tired of in the long run. So I’ve decided to switch to M-x shell and see how much of my Eshell workflow I could port. Language and the underlying shell pr…via Pierre Neidhardt's homepage June 26, 2020
Generated by openring