SlideShare a Scribd company logo
1 of 27
Download to read offline
Vers une science reproductible
Roberto Di Cosmo
Software Heritage
INRIA and IRIF
roberto@dicosmo.org
December 6th, 2017
THE GREAT LIBRARY OF SOURCE CODE
Roberto Di Cosmo Towards reproducible science December 6th, 2017 1 / 23
Outline
1 The Science Crisis
2 The state of Software reproducibility
3 The Software Heritage initiative
4 Conclusion
Roberto Di Cosmo Towards reproducible science December 6th, 2017 2 / 23
Inconsistencies all around us
What causes cancer?
Is everything we eat associated with cancer?
Schoenfeld and Ioannidis, Amer. Jour. of Clinical Nutrition, 2013.
Inconsistency an incompatibility between two propositions that cannot
both be true
SEPT2 is Septin 2 or September 2nd?
Gene name errors are widespread in the scientific literature Zie-
mann, Eren and El-Osta, Genome Biology, 2016.
Corruption The process by which a computer database or program
becomes debased by alteration or the introduction of
errors
Roberto Di Cosmo Towards reproducible science December 6th, 2017 2 / 23
And it gets worse!
Doctored data?
Fraud wrongful or criminal deception intended to
result in financial or personal gain
What are drugs good for?
Non reproducibile results ...
Roberto Di Cosmo Towards reproducible science December 6th, 2017 3 / 23
We face a science crisis
"Sub-prime science"? (Nicholas Humprey)
inconsistencies
data corruption, fraud
non reproducible findings... (picture from Nature, Sep. 2015)
The world starts noticing
October 2013 John Oliver, Science May 2016
Roberto Di Cosmo Towards reproducible science December 6th, 2017 4 / 23
How we built our scientific knowledge
The experimental method
make an observation
formulate an hypothesis
set up an experiment
formulate a theory
And then we reproduce and verify.
Reproducibility is the key
non-reproducible single occurrences are of no significance to science
Karl Popper, The Logic of Scientific Discovery, 1934
Roberto Di Cosmo Towards reproducible science December 6th, 2017 5 / 23
Reproducibility in the digital age
For an experiment involving software, we need
open access to the scientific article describing it
open data sets used in the experiment
source code of all the components
environment of execution
stable references between all this
Remark
The first two items are already widely discussed!
... what about software?
Roberto Di Cosmo Towards reproducible science December 6th, 2017 6 / 23
Software is Knowledge
Software is an essential component of modern scientific research
Top 100 papers (Nature, October 2014)
[...] the vast majority describe experimental methods or
software that have become essential in their fields.
http://www.nature.com/news/
the-top-100-papers-1.16224
Roberto Di Cosmo Towards reproducible science December 6th, 2017 7 / 23
Pressure to make research code available is now raising
Evaluation of software artefacts (optional)
tools are usable, in line with expectations
started as a contest in 2011 (ESEC/FSE) (winner Vouillon and Di
Cosmo)
now going mainstream: POPL’17, POPL’16, ECOOP’16,
OOPSLA’16, CGO’16, VISSOFT’16, PLDI’16, CGO’15, PPoPP’15,
VISSOFT’15, ISSTA’15, OOPSLA’15, PLDI’15, POPL’15, CAV’15,
ECOOP’15, FSE’15, ISSTA’14, OOPSLA’14, PLDI’14, ECOOP’14,
FSE’14, SAS’13, OOPSLA’13, ECOOP’13, FSE’13, FSE’11
Roberto Di Cosmo Towards reproducible science December 6th, 2017 8 / 23
Use the Source, Luke!
Repeatability, replicability, reproducibility: having (all) the source code used in an
experiment is needed for the first two.
Some claim it is not worth the effort
“Replicability is not Reproducibility: Nor is it Good Science”, Chris Drummond, ICML 2009
Sure, diversity is important, but:
Source code is like the proof used in a theorem: can we really accept Fermat
statements like “the details are omitted due to lack of space”?
modern complex systems make even the simplest experiment depend on a wealth
of components and configuration options
access to all the source code is not just necessary to replicate, it is also useful to
evolve and modify, to build new experiments from the old ones
Roberto Di Cosmo Towards reproducible science December 6th, 2017 9 / 23
Source code matters!
"The source code for a work means the preferred form of the work for making
modifications to it." — GPL Licence
Hello World
Program (excerpt of binary)
4004e6: 55
4004e7: 48 89 e5
4004ea: bf 84 05 40 00
4004ef: b8 00 00 00 00
4004f4: e8 c7 fe ff ff
4004f9: 90
4004fa: 5d
4004fb: c3
Program (source code)
/* Hello World program */
#include<stdio.h>
void main()
{
printf("Hello World");
}
Roberto Di Cosmo Towards reproducible science December 6th, 2017 10 / 23
Software Source Code is special
Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985
“Programs must be written for people to read, and only incidentally for machines to execute.”
Quake 2 source code (excerpt) Net. queue in Linux (excerpt)
Len Shustek, Computer History Museum
“Source code provides a view into the mind of the designer.”
Roberto Di Cosmo Towards reproducible science December 6th, 2017 11 / 23
~ 50 years, a lightning fast growth
Apollo 11 Guidance Computer (~60.000 lines), 1969
"When I first got into it, nobody knew what it was that we were doing. It
was like the Wild West."
Margaret Hamilton
Linux Kernel
... now in your pockets!
are we taking care of all this?
Roberto Di Cosmo Towards reproducible science December 6th, 2017 12 / 23
Outline
1 The Science Crisis
2 The state of Software reproducibility
3 The Software Heritage initiative
4 Conclusion
Roberto Di Cosmo Towards reproducible science December 6th, 2017 13 / 23
Collberg’s report from the trenches
Analysis of 613 papers
8 ACM conferences: ASPLOS’12,
CCS’12, OOPSLA’12, OSDI’12,
PLDI’12, SIGMOD’12, SOSP’11,
VLDB’12
5 journals: TACO’9, TISSEC’15,
TOCS’30, TODS’37, TOPLAS’34
all very practical oriented
The basic question
can we get the code to build and run?
The workflow
Roberto Di Cosmo Towards reproducible science December 6th, 2017 13 / 23
The result
This can be debated (see http:
//cs.brown.edu/~sk/Memos/Examining-Reproducibility/), but...
... that’s a whopping 81% of non reproducible works!
Roberto Di Cosmo Towards reproducible science December 6th, 2017 14 / 23
The reasons (or, “the dog ate my program”)
Why so much software fails to pass the test?
Many issues, nice anecdotes, and it finally boils down to
Availability
Traceability
Environment
Automation (do you use continuous integration?)
Documentation
Understanding (including free/open source software)
The first two are important software preservation issues
Yes, code is fragile:
it can be destroyed, and we can lose trace of it
Roberto Di Cosmo Towards reproducible science December 6th, 2017 15 / 23
Outline
1 The Science Crisis
2 The state of Software reproducibility
3 The Software Heritage initiative
4 Conclusion
Roberto Di Cosmo Towards reproducible science December 6th, 2017 16 / 23
The Software Heritage Project www.softwareheritage.org
Our mission
Collect, preserve and share the source code of all the software that is available
Past, present and future
Preserving the past, enhancing the present, preparing the future
Roberto Di Cosmo Towards reproducible science December 6th, 2017 16 / 23
Supporting more accessible and reproducible science
A global library referencing all software used in all research fields
completes the infrastructure for Open Access in science
provides intrinsic persistent identifiers needed for scientific reproducibility
enables large scale, verifiable software studies
Roberto Di Cosmo Towards reproducible science December 6th, 2017 17 / 23
The Knowledge Conservancy Magic Triangle
The Knowledge Conservancy Magic Triangle
Legenda (links are important!)
articles: ArXiv, HAL, ...
data: Zenodo, ...
software: Software Heritage to the rescue
Roberto Di Cosmo Towards reproducible science December 6th, 2017 18 / 23
Open parenthesis: reloading Open Access!
Brief summary
researchers do all the work... and publishers lock the results behind paywalls
public money is wasted in subscriptions... and subtracted to research
we reacted with Open access... and got caught in the "gold open access" trap
The essential missing bit in the conversation
Art. L. 131-4.
La cession par l’auteur de ses droits sur son oeuvre peut être totale ou partielle. Elle
doit comporter au profit de l’auteur la participation proportionnelle aux recettes
provenant de la vente ou de l’exploitation.
Yes, it’s the greatest copyright violation in history!
Authorities refuse to see... but spend 100+Millions in programs like ISTEX
Roberto Di Cosmo Towards reproducible science December 6th, 2017 19 / 23
Archive coverage
~150 TB blobs, ~5 TB database (as a graph: ~7 B nodes + ~60 B edges)
Our sources
GitHub — full, up-to-date mirror
Debian — automation in progress; GNU
Gitorious, Google Code — processing (Archive Team & Google)
Bitbucket — WIP
The richest source code archive already, ... and growing daily!
Roberto Di Cosmo Towards reproducible science December 6th, 2017 20 / 23
Going global
April 3rd, 2017: landmark Inria Unesco agreement...
https://www.softwareheritage.org/blog
September 28th, 2017
Mauritius Call on information access
Roberto Di Cosmo Towards reproducible science December 6th, 2017 21 / 23
An unique opportunity
Library of Alexandria of code
Take urgent action to
recover the past
structure the future
A CERN for Software
Photo: ALMA(ESO/NAOJ/NRAO), R. Hills
Build a common infrastructure
software research, better science
for society as a whole
Come in, we’re open www.softwareheritage.org
tons of research problems, and our code: forge.softwareheritage.org
Roberto Di Cosmo Towards reproducible science December 6th, 2017 22 / 23
Outline
1 The Science Crisis
2 The state of Software reproducibility
3 The Software Heritage initiative
4 Conclusion
Roberto Di Cosmo Towards reproducible science December 6th, 2017 23 / 23
Age of responsibility
Technology reshapes society
Code is law (Lawrence Lessig)
reproducitibility, replicability, science safety, security, privacy, anonimity
trust, justice, freedom, transparency ethical and social challenges
We need you!
value "responsible" research
contribute to Free Software
ask questions, join the conversation
Giambattista Vico (1688 - 1744)
Conoscere, è saper fare
Roberto Di Cosmo Towards reproducible science December 6th, 2017 23 / 23

More Related Content

Similar to #OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI COSMO

Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...OW2
 
Software Heritage: let's build together the universal archive of our software...
Software Heritage: let's build together the universal archive of our software...Software Heritage: let's build together the universal archive of our software...
Software Heritage: let's build together the universal archive of our software...Codemotion
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
La préservation des logiciels: défis et opportunités pour la reproductibilité...
La préservation des logiciels: défis et opportunités pour la reproductibilité...La préservation des logiciels: défis et opportunités pour la reproductibilité...
La préservation des logiciels: défis et opportunités pour la reproductibilité...Roberto Di Cosmo
 
Open Source Software (OSS) applications in libraries: Special Reference to Se...
Open Source Software (OSS) applications in libraries: Special Reference to Se...Open Source Software (OSS) applications in libraries: Special Reference to Se...
Open Source Software (OSS) applications in libraries: Special Reference to Se...dbpublications
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaFriprogsenteret
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Building the iRODS Consortium
Building the iRODS ConsortiumBuilding the iRODS Consortium
Building the iRODS ConsortiumAll Things Open
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Go open2010 sde_20100417
Go open2010 sde_20100417Go open2010 sde_20100417
Go open2010 sde_20100417Sandro D'Elia
 
What Academia Can Learn from Open Source
What Academia Can Learn from Open SourceWhat Academia Can Learn from Open Source
What Academia Can Learn from Open SourceAll Things Open
 
R. Di Cosmo - Software Heritage
R. Di Cosmo - Software HeritageR. Di Cosmo - Software Heritage
R. Di Cosmo - Software HeritageLibreItalia
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Community based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectCommunity based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectMarkus Neteler
 
"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)npinto
 
Citizen science project list (Europe & worldwide) v1
Citizen science project list (Europe & worldwide) v1Citizen science project list (Europe & worldwide) v1
Citizen science project list (Europe & worldwide) v1Egle Marija Ramanauskaite
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 

Similar to #OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI COSMO (20)

Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...
 
Software Heritage: let's build together the universal archive of our software...
Software Heritage: let's build together the universal archive of our software...Software Heritage: let's build together the universal archive of our software...
Software Heritage: let's build together the universal archive of our software...
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
La préservation des logiciels: défis et opportunités pour la reproductibilité...
La préservation des logiciels: défis et opportunités pour la reproductibilité...La préservation des logiciels: défis et opportunités pour la reproductibilité...
La préservation des logiciels: défis et opportunités pour la reproductibilité...
 
Open Source Software (OSS) applications in libraries: Special Reference to Se...
Open Source Software (OSS) applications in libraries: Special Reference to Se...Open Source Software (OSS) applications in libraries: Special Reference to Se...
Open Source Software (OSS) applications in libraries: Special Reference to Se...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Building the iRODS Consortium
Building the iRODS ConsortiumBuilding the iRODS Consortium
Building the iRODS Consortium
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Open source technology
Open source technologyOpen source technology
Open source technology
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Go open2010 sde_20100417
Go open2010 sde_20100417Go open2010 sde_20100417
Go open2010 sde_20100417
 
What Academia Can Learn from Open Source
What Academia Can Learn from Open SourceWhat Academia Can Learn from Open Source
What Academia Can Learn from Open Source
 
R. Di Cosmo - Software Heritage
R. Di Cosmo - Software HeritageR. Di Cosmo - Software Heritage
R. Di Cosmo - Software Heritage
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Community based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectCommunity based software development: The GRASS GIS project
Community based software development: The GRASS GIS project
 
"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)
 
Citizen science project list (Europe & worldwide) v1
Citizen science project list (Europe & worldwide) v1Citizen science project list (Europe & worldwide) v1
Citizen science project list (Europe & worldwide) v1
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 

More from Paris Open Source Summit

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...Paris Open Source Summit
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...Paris Open Source Summit
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...Paris Open Source Summit
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, ArduinoParis Open Source Summit
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...Paris Open Source Summit
 
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...Paris Open Source Summit
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, ZabbixParis Open Source Summit
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, InriaParis Open Source Summit
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...Paris Open Source Summit
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...Paris Open Source Summit
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...Paris Open Source Summit
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...Paris Open Source Summit
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...Paris Open Source Summit
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...Paris Open Source Summit
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...Paris Open Source Summit
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...Paris Open Source Summit
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données Paris Open Source Summit
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...Paris Open Source Summit
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...Paris Open Source Summit
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...Paris Open Source Summit
 

More from Paris Open Source Summit (20)

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
 
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI COSMO

  • 1. Vers une science reproductible Roberto Di Cosmo Software Heritage INRIA and IRIF roberto@dicosmo.org December 6th, 2017 THE GREAT LIBRARY OF SOURCE CODE Roberto Di Cosmo Towards reproducible science December 6th, 2017 1 / 23
  • 2. Outline 1 The Science Crisis 2 The state of Software reproducibility 3 The Software Heritage initiative 4 Conclusion Roberto Di Cosmo Towards reproducible science December 6th, 2017 2 / 23
  • 3. Inconsistencies all around us What causes cancer? Is everything we eat associated with cancer? Schoenfeld and Ioannidis, Amer. Jour. of Clinical Nutrition, 2013. Inconsistency an incompatibility between two propositions that cannot both be true SEPT2 is Septin 2 or September 2nd? Gene name errors are widespread in the scientific literature Zie- mann, Eren and El-Osta, Genome Biology, 2016. Corruption The process by which a computer database or program becomes debased by alteration or the introduction of errors Roberto Di Cosmo Towards reproducible science December 6th, 2017 2 / 23
  • 4. And it gets worse! Doctored data? Fraud wrongful or criminal deception intended to result in financial or personal gain What are drugs good for? Non reproducibile results ... Roberto Di Cosmo Towards reproducible science December 6th, 2017 3 / 23
  • 5. We face a science crisis "Sub-prime science"? (Nicholas Humprey) inconsistencies data corruption, fraud non reproducible findings... (picture from Nature, Sep. 2015) The world starts noticing October 2013 John Oliver, Science May 2016 Roberto Di Cosmo Towards reproducible science December 6th, 2017 4 / 23
  • 6. How we built our scientific knowledge The experimental method make an observation formulate an hypothesis set up an experiment formulate a theory And then we reproduce and verify. Reproducibility is the key non-reproducible single occurrences are of no significance to science Karl Popper, The Logic of Scientific Discovery, 1934 Roberto Di Cosmo Towards reproducible science December 6th, 2017 5 / 23
  • 7. Reproducibility in the digital age For an experiment involving software, we need open access to the scientific article describing it open data sets used in the experiment source code of all the components environment of execution stable references between all this Remark The first two items are already widely discussed! ... what about software? Roberto Di Cosmo Towards reproducible science December 6th, 2017 6 / 23
  • 8. Software is Knowledge Software is an essential component of modern scientific research Top 100 papers (Nature, October 2014) [...] the vast majority describe experimental methods or software that have become essential in their fields. http://www.nature.com/news/ the-top-100-papers-1.16224 Roberto Di Cosmo Towards reproducible science December 6th, 2017 7 / 23
  • 9. Pressure to make research code available is now raising Evaluation of software artefacts (optional) tools are usable, in line with expectations started as a contest in 2011 (ESEC/FSE) (winner Vouillon and Di Cosmo) now going mainstream: POPL’17, POPL’16, ECOOP’16, OOPSLA’16, CGO’16, VISSOFT’16, PLDI’16, CGO’15, PPoPP’15, VISSOFT’15, ISSTA’15, OOPSLA’15, PLDI’15, POPL’15, CAV’15, ECOOP’15, FSE’15, ISSTA’14, OOPSLA’14, PLDI’14, ECOOP’14, FSE’14, SAS’13, OOPSLA’13, ECOOP’13, FSE’13, FSE’11 Roberto Di Cosmo Towards reproducible science December 6th, 2017 8 / 23
  • 10. Use the Source, Luke! Repeatability, replicability, reproducibility: having (all) the source code used in an experiment is needed for the first two. Some claim it is not worth the effort “Replicability is not Reproducibility: Nor is it Good Science”, Chris Drummond, ICML 2009 Sure, diversity is important, but: Source code is like the proof used in a theorem: can we really accept Fermat statements like “the details are omitted due to lack of space”? modern complex systems make even the simplest experiment depend on a wealth of components and configuration options access to all the source code is not just necessary to replicate, it is also useful to evolve and modify, to build new experiments from the old ones Roberto Di Cosmo Towards reproducible science December 6th, 2017 9 / 23
  • 11. Source code matters! "The source code for a work means the preferred form of the work for making modifications to it." — GPL Licence Hello World Program (excerpt of binary) 4004e6: 55 4004e7: 48 89 e5 4004ea: bf 84 05 40 00 4004ef: b8 00 00 00 00 4004f4: e8 c7 fe ff ff 4004f9: 90 4004fa: 5d 4004fb: c3 Program (source code) /* Hello World program */ #include<stdio.h> void main() { printf("Hello World"); } Roberto Di Cosmo Towards reproducible science December 6th, 2017 10 / 23
  • 12. Software Source Code is special Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985 “Programs must be written for people to read, and only incidentally for machines to execute.” Quake 2 source code (excerpt) Net. queue in Linux (excerpt) Len Shustek, Computer History Museum “Source code provides a view into the mind of the designer.” Roberto Di Cosmo Towards reproducible science December 6th, 2017 11 / 23
  • 13. ~ 50 years, a lightning fast growth Apollo 11 Guidance Computer (~60.000 lines), 1969 "When I first got into it, nobody knew what it was that we were doing. It was like the Wild West." Margaret Hamilton Linux Kernel ... now in your pockets! are we taking care of all this? Roberto Di Cosmo Towards reproducible science December 6th, 2017 12 / 23
  • 14. Outline 1 The Science Crisis 2 The state of Software reproducibility 3 The Software Heritage initiative 4 Conclusion Roberto Di Cosmo Towards reproducible science December 6th, 2017 13 / 23
  • 15. Collberg’s report from the trenches Analysis of 613 papers 8 ACM conferences: ASPLOS’12, CCS’12, OOPSLA’12, OSDI’12, PLDI’12, SIGMOD’12, SOSP’11, VLDB’12 5 journals: TACO’9, TISSEC’15, TOCS’30, TODS’37, TOPLAS’34 all very practical oriented The basic question can we get the code to build and run? The workflow Roberto Di Cosmo Towards reproducible science December 6th, 2017 13 / 23
  • 16. The result This can be debated (see http: //cs.brown.edu/~sk/Memos/Examining-Reproducibility/), but... ... that’s a whopping 81% of non reproducible works! Roberto Di Cosmo Towards reproducible science December 6th, 2017 14 / 23
  • 17. The reasons (or, “the dog ate my program”) Why so much software fails to pass the test? Many issues, nice anecdotes, and it finally boils down to Availability Traceability Environment Automation (do you use continuous integration?) Documentation Understanding (including free/open source software) The first two are important software preservation issues Yes, code is fragile: it can be destroyed, and we can lose trace of it Roberto Di Cosmo Towards reproducible science December 6th, 2017 15 / 23
  • 18. Outline 1 The Science Crisis 2 The state of Software reproducibility 3 The Software Heritage initiative 4 Conclusion Roberto Di Cosmo Towards reproducible science December 6th, 2017 16 / 23
  • 19. The Software Heritage Project www.softwareheritage.org Our mission Collect, preserve and share the source code of all the software that is available Past, present and future Preserving the past, enhancing the present, preparing the future Roberto Di Cosmo Towards reproducible science December 6th, 2017 16 / 23
  • 20. Supporting more accessible and reproducible science A global library referencing all software used in all research fields completes the infrastructure for Open Access in science provides intrinsic persistent identifiers needed for scientific reproducibility enables large scale, verifiable software studies Roberto Di Cosmo Towards reproducible science December 6th, 2017 17 / 23
  • 21. The Knowledge Conservancy Magic Triangle The Knowledge Conservancy Magic Triangle Legenda (links are important!) articles: ArXiv, HAL, ... data: Zenodo, ... software: Software Heritage to the rescue Roberto Di Cosmo Towards reproducible science December 6th, 2017 18 / 23
  • 22. Open parenthesis: reloading Open Access! Brief summary researchers do all the work... and publishers lock the results behind paywalls public money is wasted in subscriptions... and subtracted to research we reacted with Open access... and got caught in the "gold open access" trap The essential missing bit in the conversation Art. L. 131-4. La cession par l’auteur de ses droits sur son oeuvre peut être totale ou partielle. Elle doit comporter au profit de l’auteur la participation proportionnelle aux recettes provenant de la vente ou de l’exploitation. Yes, it’s the greatest copyright violation in history! Authorities refuse to see... but spend 100+Millions in programs like ISTEX Roberto Di Cosmo Towards reproducible science December 6th, 2017 19 / 23
  • 23. Archive coverage ~150 TB blobs, ~5 TB database (as a graph: ~7 B nodes + ~60 B edges) Our sources GitHub — full, up-to-date mirror Debian — automation in progress; GNU Gitorious, Google Code — processing (Archive Team & Google) Bitbucket — WIP The richest source code archive already, ... and growing daily! Roberto Di Cosmo Towards reproducible science December 6th, 2017 20 / 23
  • 24. Going global April 3rd, 2017: landmark Inria Unesco agreement... https://www.softwareheritage.org/blog September 28th, 2017 Mauritius Call on information access Roberto Di Cosmo Towards reproducible science December 6th, 2017 21 / 23
  • 25. An unique opportunity Library of Alexandria of code Take urgent action to recover the past structure the future A CERN for Software Photo: ALMA(ESO/NAOJ/NRAO), R. Hills Build a common infrastructure software research, better science for society as a whole Come in, we’re open www.softwareheritage.org tons of research problems, and our code: forge.softwareheritage.org Roberto Di Cosmo Towards reproducible science December 6th, 2017 22 / 23
  • 26. Outline 1 The Science Crisis 2 The state of Software reproducibility 3 The Software Heritage initiative 4 Conclusion Roberto Di Cosmo Towards reproducible science December 6th, 2017 23 / 23
  • 27. Age of responsibility Technology reshapes society Code is law (Lawrence Lessig) reproducitibility, replicability, science safety, security, privacy, anonimity trust, justice, freedom, transparency ethical and social challenges We need you! value "responsible" research contribute to Free Software ask questions, join the conversation Giambattista Vico (1688 - 1744) Conoscere, è saper fare Roberto Di Cosmo Towards reproducible science December 6th, 2017 23 / 23