SlideShare a Scribd company logo
1 of 75
Download to read offline
Coding for science and innovation
Ga¨el Varoquaux
to change the world!
Science
The process of discovering
knowledge and mechanisms
Computing is a central part of how we do science
G Varoquaux 2
Science
The process of discovering
knowledge and mechanisms
Computing is a central part of how we do science
Science + Computers = Computational science
Nuclear physics Fluid dynamics Chemistry
G Varoquaux 2
Science
The process of discovering
knowledge and mechanisms
Computing is a central part of how we do science
Science + Computers = Computational science
Psychology
G Varoquaux 2
Science
The process of discovering
knowledge and mechanisms
Computing is a central part of how we do science
Science + Computers = Computational science
Psychology
Marketting
Data science: using data to acquire insights
G Varoquaux 2
Science
The process of discovering
knowledge and mechanisms
“Science is not a political construct or a belief sys-
tem. Scientific progress depends on openness, trans-
parency, and the free flow of ideas and people.”
— Dr. Rush Holt, CEO of AAAS,
testimony to the House Committee on Science, Space, and Tech-
nology, Feb 8, 2017
G Varoquaux 3
Science
The process of discovering
knowledge and mechanisms
Science helps shaping society
Growth in a time of debt [Reinhart & Rogoff 2010]:
Wrong conclusions due to flawed Excel processing
⇒ Public debt blamed for financial crisis (Osborne UK MP)
Autism and vaccines:
forged study: [Wakefield et al, Lancet 1998]
⇒ Drop in vaccination, measles outbreak
Loss of trust in science is very costly
G Varoquaux 3
Innovation
Putting the right technology to the right use
G Varoquaux 4
Innovation
Putting the right technology to the right use
Light blub:
Invented ∼ 1835 by Lindsay
Extra progress: vaccum pumps (Swan ∼ 1880)
Economics: availability of electric power
⇒ Edison’s company
G Varoquaux 4
Innovation
Putting the right technology to the right use
Light blub:
Invented ∼ 1835 by Lindsay
Extra progress: vaccum pumps (Swan ∼ 1880)
Economics: availability of electric power
⇒ Edison’s company
Outbox: company digitizing physical mail
But citizens aren’t the USPS customers, junk mailers are
⇒ No cooperation from USPS, Outbox dies
Power balances drive innovation as much as technology
G Varoquaux 4
Coding for science and innovation:
Computing is the new electricity:
a driver for change
With new data sources,
it reaches beyond physics & engineering
G Varoquaux 5
Coding for science and innovation:
1 Coding as a scientist
2 Building software for science
3 An ecosystem
G Varoquaux 6
1 Coding as a scientist
G Varoquaux 7
1 Data in brain sciences
The mental world
cognition, emotions
autism, depression
Historically studied
via verbal interactions
Psychology
G Varoquaux 8
1 Data in brain sciences
The mental world
cognition, emotions
autism, depression
Historically studied
via verbal interactions
The brain
an organ:
neurons, firing
Imaging brain activity
Quantitative data
G Varoquaux 8
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
Comparing the brain activity of many subjects
Supervised machine learning to discriminate Autism
G Varoquaux 9
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
1. Extract brain networks
Unsupervised feature learning
complex model fit to 1Tb data
G Varoquaux 9
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
1. Extract brain networks
2. Per-subject connections
Information geometry,
Lie algebra...
G Varoquaux 9
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
1. Extract brain networks
2. Per-subject connections
3. Supervised learning
Scikit-learn
G Varoquaux 9
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
1. Extract brain networks
2. Per-subject connections
3. Supervised learning
Scikit-learn
Limits to impact:
Cannot outperform clinicians that define Autism/Control
Psychiatrists unhappy with current blurry definition
But not ready to accept black-box algorithmic definition
G Varoquaux 9
1 One example of our work: biomarkers of Autism
[Abraham...Varoquaux, 2017]
1. Extract brain networks
2. Per-subject connections
3. Supervised learning
Scikit-learn
Limits to impact:
Cannot outperform clinicians that define Autism/Control
Psychiatrists unhappy with current blurry definition
But not ready to accept black-box algorithmic definition
Lots of moving parts
Practitionners need to
make the tools theirs
G Varoquaux 9
1 A quest for trust: reproducible research
“if it’s not open and verifiable by others, it’s not science,
or engineering, or whatever it is you call what we do“
— V. Stodden, The scientific method in practice
Computational reproducibility:
Automate everything
Control the environment
G Varoquaux 10
1 Automate everything
Just a simple matter of programming
G Varoquaux 11
1 Automate everything...
Some operations work better with a human in the loop
Scientific research is an iterative process
Tension between needs for interaction and replay
G Varoquaux 11
1 Automate everything...
Some operations work better with a human in the loop
Scientific research is an iterative process
Tension between needs for interaction and replay
Mayavi
Reflexivity between dialogs and objects
Record mode
G Varoquaux 11
1 Automate everything...
Some operations work better with a human in the loop
Scientific research is an iterative process
Tension between needs for interaction and replay
Jupyter, and its widgets:
Exploring the space between interaction and code
G Varoquaux 11
1 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
G Varoquaux 12
1 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
Estimating the reproducibility of psychological science
[Science 2015] 36% of effects replicate
Reasons:
Statistical challenges — analysis degrees of freedom
Weak insentives — winner’s curse in publication
Seldom computational reproducibility
G Varoquaux 12
1 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
Estimating the reproducibility of psychological science
[Science 2015] 36% of effects replicate
Reasons:
Statistical challenges — analysis degrees of freedom
Weak insentives — winner’s curse in publication
Seldom computational reproducibility
I think that reproducibility is a misnomer.
What matters is that operations be
verifiable or reusable.
G Varoquaux 12
In practice, the best way to improve research
is to use the right (conceptual) tools.
G Varoquaux 13
1 Managing complexity
In practice, the best way to improve research
is to use the right (conceptual) tools.
The everyday roadblock is cognitive load
Machine learning, brain anatomy, psychology
R, Python, shell scripts
Funding agencies, reviewer 3, courting VCs
G Varoquaux 14
Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
G Varoquaux 15
Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
In the industry
Reusable
Verifiable? Not for silicon valley,
but in insurance, healthcare, banking...
Moving data-scientist code
to production?
Software projects going over budget?
G Varoquaux 15
Code quality in exploratory work
Use pyflakes in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
G Varoquaux 16
Code quality in exploratory workIncreasingcost
?
Use pyflakes in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Avoid premature software engineering
G Varoquaux 16
Code quality in exploratory workIncreasingcost
?
Use pyflakes in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Avoid premature software engineering
Over versus under engineering
Goal is generating insights / moving in new spaces
Experimentation for intuitions and proofs of concepts
⇒ new ideas
As the path becomes clear: consolidation
is great for that
Heavy engineering too early freezes bad ideas
G Varoquaux 16
2 Building software for science
The point of view of the developer
Libraries are what enables us to scale:
Abstractions reduce cognitive load
Code reuse gets us further
G Varoquaux 17
2 Examples of such libraries
scikit-learn
Make research in machine-learning
models and algorithm useable to people
who do not understand them
ni
nilearn
Make it easy to answer neuroimaging
problems with them
G Varoquaux 18
2 Examples of such libraries
scikit-learn
Make research in machine-learning
models and algorithm useable to people
who do not understand them
Challenges:
Variety of that space
Statistical concepts coding concepts
ni
nilearn
Make it easy to answer neuroimaging
problems with them
Challenges: Onboarding technology-adverse users
G Varoquaux 18
2 Tools that reduce cognitive overload
It’s a design problem
G Varoquaux 19
2 Tools that reduce cognitive overload
Jonathan Ive, an industrial designer, is #4 at Apple
Code different.
G Varoquaux 20
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 21
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
np.save(file, obj) pickle.dump(obj, file)
fmin(...maxiter=10) lsq linear(...max iter=10)
Creates cognitive overload
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 22
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
Objects have hidden states,
Objects have no universal interface, entry point, output
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 23
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
How much do usage patterns carry out across the library?
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 24
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Facilitates working with multiple libraries together
Easier to get up to speed with a given library
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 25
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Change of behavior depending on input type
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 26
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Interfaces define objects
Incompatible behaviors lead to bugs (eg np.matrix)
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 27
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Properties obfuscate the data model of the object
Properties can create hidden compute costs
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 28
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Objects are understood by their surface
Composition creates cognitive overload
Error messages matter
Be Pythonic
G Varoquaux 29
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Explain the problem
Print the offending value
Be Pythonic
G Varoquaux 30
2 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
Avoid syntax hacks
G Varoquaux 31
2 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
G Varoquaux 32
2 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
The estimator is a “contract”
(slightly more elaborate than above)
It has created an ecosystem of packages
Based on duck-typing, not inheritence
G Varoquaux 32
2 numpy arrays
03878794797927
01790752701578
94071746124797
54970718717887
0495190
03878794797927
01790752701578
94071746124797
54970718717887
495190
ndarray
Abstraction over pointers & operation
Contract: the memory layout
IMHO, gone too far in number of methods (163)
The array protocol makes it easy to quack like an array
PS: The ecosystem needs categorical dtypes in numpy
G Varoquaux 33
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
G Varoquaux 34
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
User flow on the scikit-learn website:
Examples
G Varoquaux 34
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
User flow on the nilearn website:
Examples
G Varoquaux 34
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
Sphinx-gallery: compiling scripts in an examples gallery
G Varoquaux 34
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
Sphinx-gallery: compiling scripts in an examples gallery
Restructured text
formatting
Capturing
outputs
Links to
function docs
+Creates Jupyter
notebooks
G Varoquaux 34
2 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
Mayavi scikit-learn nilearn
Sphinx-gallery: compiling scripts in an examples gallery
Insert links to examples
containing a function
G Varoquaux 34
2 Building great documentation
Focus on explaining concepts (hint: write plain English)
Less is more: prioritize, avoid redundancy
Code examples must be short (link to full tutorial examples)
Links everywhere: users will land at the wrong place
Teach with the docs
Plan for maintenance of docs:
Continuous integration
Check links
Runs examples
Doctests
G Varoquaux 35
2 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
G Varoquaux 36
2 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Resource intensive CI:
Data ⇒ Fight for good open data
Computation ⇒ Find good algorithms and tradeoffs
Forces us to distill the literature (as a review)
G Varoquaux 36
2 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Package development consolidates
science and moves it outside the lab
G Varoquaux 36
3 An ecosystem
A bird’s eye view on scientific packages
G Varoquaux 37
3 Packages of the Python ecosystem
1 10 100 1000 10000
Package rank
104
105
106
107
108
109
NumberofPyPIdownloads
A small number of packages
are used by many
1
f distribution, preferential attachment
G Varoquaux 38
3 Packages of the Python ecosystem
1 10 100 1000 10000
Package rank
104
105
106
107
108
109
NumberofPyPIdownloads
numpy#49
scikit-learn #110
joblib #431
nilearn
#2877
simplejson #1
six #2setuptools#3
A small number of packages
are used by many
1
f distribution, preferential attachment
nilearn relies on scikit-learn & joblib that rely on numpy...
G Varoquaux 38
3 Standing on the shoulders of maintainers
May 31th: pip broken
https://github.com/pypa/
setuptools/pull/1043
Left-pad:
How left-padding strings broke
the Internet
A Javascript package
for left padding strings
was removed from
node’s package manager,
breaking all the websites
that depended on it.
G Varoquaux 39
3 Dependencies
Beyond installation, a challenge is to ensure package
versions play way together: correctness of the code
Breakage of backward compability
yields irreconcilable dependencies
G Varoquaux 40
3 Dependencies and their upgrade
It’s a fact: users hate upgrading
If it ain’t broken, don’t fix it
even if it is, apparently
G Varoquaux 41
3 Declaring undependence?
Monolythic packages with no dependencies...
But:
Scaling is hard
Complexity grows as square of codebase size
[Woodfield 1979]
User support grows with userbase size
G Varoquaux 42
3 Core software is infrastructure
Everybody uses it everyday
In industry, education, & research
G Varoquaux 43
3 Core software is infrastructure
Everybody uses it everyday
In industry, education, & research
It needs maintenance
Like roads (or openSSL, to prevent heartbleed)
Central infrastructure packages are “boring”
They are understaffed and underfunded
References: “Roads and Bridge” Ford foundation report
Excellent talk by Heather Miller
https://www.youtube.com/watch?v=17yy5BwIiTw
G Varoquaux 43
@GaelVaroquaux
Coding for science and innovation
New science
High value of bringing new methods to a field
⇒ Enable domain-specialists
Rapid interation, but with automation & consolidation
Software tools
Scientists are limited by cognitive load
⇒ Design of API and documentation in libraries
Libraries make science reproducible and reusable
An ecosystem
Central packages hold the ecosystem together
Thanks to: the scipy community

More Related Content

Similar to Coding for science and innovation

Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable PapersJose Enrique Ruiz
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paperJose Enrique Ruiz
 
Research in Computer Science and Engineering
Research in Computer Science and EngineeringResearch in Computer Science and Engineering
Research in Computer Science and EngineeringOdiaPua1
 
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Han Woo PARK
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402vrij
 
tools for communicating in the computational sciences
tools for communicating in the computational sciencestools for communicating in the computational sciences
tools for communicating in the computational sciencesBrian Bot
 
Computational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyComputational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyPaul Herring
 
Big data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT ArchitectureBig data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT ArchitectureYves Caseau
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfessionGary Rector
 
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix Brezo
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix BrezoJéssica Cohen, José M. Blanco, Yaiza Rubio, Félix Brezo
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix BrezoJose María Blanco Navarro
 
The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development Natalia Juristo
 
Introduction to AI (Artificial Intelligence).
Introduction to AI (Artificial Intelligence).Introduction to AI (Artificial Intelligence).
Introduction to AI (Artificial Intelligence).amolakkumar45
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lrDominic A Ienco
 
Lecture 1 Slides -Introduction to algorithms.pdf
Lecture 1 Slides -Introduction to algorithms.pdfLecture 1 Slides -Introduction to algorithms.pdf
Lecture 1 Slides -Introduction to algorithms.pdfRanvinuHewage
 
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesIncreasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesSandra Gesing
 

Similar to Coding for science and innovation (20)

Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable Papers
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paper
 
Research in Computer Science and Engineering
Research in Computer Science and EngineeringResearch in Computer Science and Engineering
Research in Computer Science and Engineering
 
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)Ict와 사회과학지식간 학제간 연구동향(23 march2013)
Ict와 사회과학지식간 학제간 연구동향(23 march2013)
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402
 
tools for communicating in the computational sciences
tools for communicating in the computational sciencestools for communicating in the computational sciences
tools for communicating in the computational sciences
 
Computational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyComputational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogy
 
Big data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT ArchitectureBig data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT Architecture
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
 
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix Brezo
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix BrezoJéssica Cohen, José M. Blanco, Yaiza Rubio, Félix Brezo
Jéssica Cohen, José M. Blanco, Yaiza Rubio, Félix Brezo
 
The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development
 
Introduction to AI (Artificial Intelligence).
Introduction to AI (Artificial Intelligence).Introduction to AI (Artificial Intelligence).
Introduction to AI (Artificial Intelligence).
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey Boulton
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr
 
Lecture 1 Slides -Introduction to algorithms.pdf
Lecture 1 Slides -Introduction to algorithms.pdfLecture 1 Slides -Introduction to algorithms.pdf
Lecture 1 Slides -Introduction to algorithms.pdf
 
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesIncreasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 

More from Gael Varoquaux (20)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Coding for science and innovation

  • 1. Coding for science and innovation Ga¨el Varoquaux to change the world!
  • 2. Science The process of discovering knowledge and mechanisms Computing is a central part of how we do science G Varoquaux 2
  • 3. Science The process of discovering knowledge and mechanisms Computing is a central part of how we do science Science + Computers = Computational science Nuclear physics Fluid dynamics Chemistry G Varoquaux 2
  • 4. Science The process of discovering knowledge and mechanisms Computing is a central part of how we do science Science + Computers = Computational science Psychology G Varoquaux 2
  • 5. Science The process of discovering knowledge and mechanisms Computing is a central part of how we do science Science + Computers = Computational science Psychology Marketting Data science: using data to acquire insights G Varoquaux 2
  • 6. Science The process of discovering knowledge and mechanisms “Science is not a political construct or a belief sys- tem. Scientific progress depends on openness, trans- parency, and the free flow of ideas and people.” — Dr. Rush Holt, CEO of AAAS, testimony to the House Committee on Science, Space, and Tech- nology, Feb 8, 2017 G Varoquaux 3
  • 7. Science The process of discovering knowledge and mechanisms Science helps shaping society Growth in a time of debt [Reinhart & Rogoff 2010]: Wrong conclusions due to flawed Excel processing ⇒ Public debt blamed for financial crisis (Osborne UK MP) Autism and vaccines: forged study: [Wakefield et al, Lancet 1998] ⇒ Drop in vaccination, measles outbreak Loss of trust in science is very costly G Varoquaux 3
  • 8. Innovation Putting the right technology to the right use G Varoquaux 4
  • 9. Innovation Putting the right technology to the right use Light blub: Invented ∼ 1835 by Lindsay Extra progress: vaccum pumps (Swan ∼ 1880) Economics: availability of electric power ⇒ Edison’s company G Varoquaux 4
  • 10. Innovation Putting the right technology to the right use Light blub: Invented ∼ 1835 by Lindsay Extra progress: vaccum pumps (Swan ∼ 1880) Economics: availability of electric power ⇒ Edison’s company Outbox: company digitizing physical mail But citizens aren’t the USPS customers, junk mailers are ⇒ No cooperation from USPS, Outbox dies Power balances drive innovation as much as technology G Varoquaux 4
  • 11. Coding for science and innovation: Computing is the new electricity: a driver for change With new data sources, it reaches beyond physics & engineering G Varoquaux 5
  • 12. Coding for science and innovation: 1 Coding as a scientist 2 Building software for science 3 An ecosystem G Varoquaux 6
  • 13. 1 Coding as a scientist G Varoquaux 7
  • 14. 1 Data in brain sciences The mental world cognition, emotions autism, depression Historically studied via verbal interactions Psychology G Varoquaux 8
  • 15. 1 Data in brain sciences The mental world cognition, emotions autism, depression Historically studied via verbal interactions The brain an organ: neurons, firing Imaging brain activity Quantitative data G Varoquaux 8
  • 16. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] Comparing the brain activity of many subjects Supervised machine learning to discriminate Autism G Varoquaux 9
  • 17. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] 1. Extract brain networks Unsupervised feature learning complex model fit to 1Tb data G Varoquaux 9
  • 18. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] 1. Extract brain networks 2. Per-subject connections Information geometry, Lie algebra... G Varoquaux 9
  • 19. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] 1. Extract brain networks 2. Per-subject connections 3. Supervised learning Scikit-learn G Varoquaux 9
  • 20. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] 1. Extract brain networks 2. Per-subject connections 3. Supervised learning Scikit-learn Limits to impact: Cannot outperform clinicians that define Autism/Control Psychiatrists unhappy with current blurry definition But not ready to accept black-box algorithmic definition G Varoquaux 9
  • 21. 1 One example of our work: biomarkers of Autism [Abraham...Varoquaux, 2017] 1. Extract brain networks 2. Per-subject connections 3. Supervised learning Scikit-learn Limits to impact: Cannot outperform clinicians that define Autism/Control Psychiatrists unhappy with current blurry definition But not ready to accept black-box algorithmic definition Lots of moving parts Practitionners need to make the tools theirs G Varoquaux 9
  • 22. 1 A quest for trust: reproducible research “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do“ — V. Stodden, The scientific method in practice Computational reproducibility: Automate everything Control the environment G Varoquaux 10
  • 23. 1 Automate everything Just a simple matter of programming G Varoquaux 11
  • 24. 1 Automate everything... Some operations work better with a human in the loop Scientific research is an iterative process Tension between needs for interaction and replay G Varoquaux 11
  • 25. 1 Automate everything... Some operations work better with a human in the loop Scientific research is an iterative process Tension between needs for interaction and replay Mayavi Reflexivity between dialogs and objects Record mode G Varoquaux 11
  • 26. 1 Automate everything... Some operations work better with a human in the loop Scientific research is an iterative process Tension between needs for interaction and replay Jupyter, and its widgets: Exploring the space between interaction and code G Varoquaux 11
  • 27. 1 Beyond computational reproducibility Make every computational step reproducible, and good science will emerge G Varoquaux 12
  • 28. 1 Beyond computational reproducibility Make every computational step reproducible, and good science will emerge Estimating the reproducibility of psychological science [Science 2015] 36% of effects replicate Reasons: Statistical challenges — analysis degrees of freedom Weak insentives — winner’s curse in publication Seldom computational reproducibility G Varoquaux 12
  • 29. 1 Beyond computational reproducibility Make every computational step reproducible, and good science will emerge Estimating the reproducibility of psychological science [Science 2015] 36% of effects replicate Reasons: Statistical challenges — analysis degrees of freedom Weak insentives — winner’s curse in publication Seldom computational reproducibility I think that reproducibility is a misnomer. What matters is that operations be verifiable or reusable. G Varoquaux 12
  • 30. In practice, the best way to improve research is to use the right (conceptual) tools. G Varoquaux 13
  • 31. 1 Managing complexity In practice, the best way to improve research is to use the right (conceptual) tools. The everyday roadblock is cognitive load Machine learning, brain anatomy, psychology R, Python, shell scripts Funding agencies, reviewer 3, courting VCs G Varoquaux 14
  • 32. Coding as a scientist Final code should be auditable, ideally reusable Tension between interactive computing & automating Main enemy: cognitive overload G Varoquaux 15
  • 33. Coding as a scientist Final code should be auditable, ideally reusable Tension between interactive computing & automating Main enemy: cognitive overload In the industry Reusable Verifiable? Not for silicon valley, but in insurance, healthcare, banking... Moving data-scientist code to production? Software projects going over budget? G Varoquaux 15
  • 34. Code quality in exploratory work Use pyflakes in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... G Varoquaux 16
  • 35. Code quality in exploratory workIncreasingcost ? Use pyflakes in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... Avoid premature software engineering G Varoquaux 16
  • 36. Code quality in exploratory workIncreasingcost ? Use pyflakes in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... Avoid premature software engineering Over versus under engineering Goal is generating insights / moving in new spaces Experimentation for intuitions and proofs of concepts ⇒ new ideas As the path becomes clear: consolidation is great for that Heavy engineering too early freezes bad ideas G Varoquaux 16
  • 37. 2 Building software for science The point of view of the developer Libraries are what enables us to scale: Abstractions reduce cognitive load Code reuse gets us further G Varoquaux 17
  • 38. 2 Examples of such libraries scikit-learn Make research in machine-learning models and algorithm useable to people who do not understand them ni nilearn Make it easy to answer neuroimaging problems with them G Varoquaux 18
  • 39. 2 Examples of such libraries scikit-learn Make research in machine-learning models and algorithm useable to people who do not understand them Challenges: Variety of that space Statistical concepts coding concepts ni nilearn Make it easy to answer neuroimaging problems with them Challenges: Onboarding technology-adverse users G Varoquaux 18
  • 40. 2 Tools that reduce cognitive overload It’s a design problem G Varoquaux 19
  • 41. 2 Tools that reduce cognitive overload Jonathan Ive, an industrial designer, is #4 at Apple Code different. G Varoquaux 20
  • 42. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 21
  • 43. 2 Some API design principles for the scipy stack Consistency, consistency, consistency np.save(file, obj) pickle.dump(obj, file) fmin(...maxiter=10) lsq linear(...max iter=10) Creates cognitive overload Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 22
  • 44. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes Objects have hidden states, Objects have no universal interface, entry point, output A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 23
  • 45. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts How much do usage patterns carry out across the library? Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 24
  • 46. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Facilitates working with multiple libraries together Easier to get up to speed with a given library Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 25
  • 47. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Change of behavior depending on input type Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 26
  • 48. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Interfaces define objects Incompatible behaviors lead to bugs (eg np.matrix) Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 27
  • 49. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Properties obfuscate the data model of the object Properties can create hidden compute costs Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 28
  • 50. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Objects are understood by their surface Composition creates cognitive overload Error messages matter Be Pythonic G Varoquaux 29
  • 51. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Explain the problem Print the offending value Be Pythonic G Varoquaux 30
  • 52. 2 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic Avoid syntax hacks G Varoquaux 31
  • 53. 2 Scikit-learn API Scikit-learn cheat sheet Scikit-learn Fit and predict >>> estimator = Estimator(param1=param1) >>> estimator.fit(X train, y train) >>> y test = estimator.predict(X test) Transform data >>> X red = estimator.transform(X test) G Varoquaux 32
  • 54. 2 Scikit-learn API Scikit-learn cheat sheet Scikit-learn Fit and predict >>> estimator = Estimator(param1=param1) >>> estimator.fit(X train, y train) >>> y test = estimator.predict(X test) Transform data >>> X red = estimator.transform(X test) The estimator is a “contract” (slightly more elaborate than above) It has created an ecosystem of packages Based on duck-typing, not inheritence G Varoquaux 32
  • 55. 2 numpy arrays 03878794797927 01790752701578 94071746124797 54970718717887 0495190 03878794797927 01790752701578 94071746124797 54970718717887 495190 ndarray Abstraction over pointers & operation Contract: the memory layout IMHO, gone too far in number of methods (163) The array protocol makes it easy to quack like an array PS: The ecosystem needs categorical dtypes in numpy G Varoquaux 33
  • 56. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn G Varoquaux 34
  • 57. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn User flow on the scikit-learn website: Examples G Varoquaux 34
  • 58. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn User flow on the nilearn website: Examples G Varoquaux 34
  • 59. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn Sphinx-gallery: compiling scripts in an examples gallery G Varoquaux 34
  • 60. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn Sphinx-gallery: compiling scripts in an examples gallery Restructured text formatting Capturing outputs Links to function docs +Creates Jupyter notebooks G Varoquaux 34
  • 61. 2 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple Mayavi scikit-learn nilearn Sphinx-gallery: compiling scripts in an examples gallery Insert links to examples containing a function G Varoquaux 34
  • 62. 2 Building great documentation Focus on explaining concepts (hint: write plain English) Less is more: prioritize, avoid redundancy Code examples must be short (link to full tutorial examples) Links everywhere: users will land at the wrong place Teach with the docs Plan for maintenance of docs: Continuous integration Check links Runs examples Doctests G Varoquaux 35
  • 63. 2 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html G Varoquaux 36
  • 64. 2 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html Resource intensive CI: Data ⇒ Fight for good open data Computation ⇒ Find good algorithms and tradeoffs Forces us to distill the literature (as a review) G Varoquaux 36
  • 65. 2 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html Package development consolidates science and moves it outside the lab G Varoquaux 36
  • 66. 3 An ecosystem A bird’s eye view on scientific packages G Varoquaux 37
  • 67. 3 Packages of the Python ecosystem 1 10 100 1000 10000 Package rank 104 105 106 107 108 109 NumberofPyPIdownloads A small number of packages are used by many 1 f distribution, preferential attachment G Varoquaux 38
  • 68. 3 Packages of the Python ecosystem 1 10 100 1000 10000 Package rank 104 105 106 107 108 109 NumberofPyPIdownloads numpy#49 scikit-learn #110 joblib #431 nilearn #2877 simplejson #1 six #2setuptools#3 A small number of packages are used by many 1 f distribution, preferential attachment nilearn relies on scikit-learn & joblib that rely on numpy... G Varoquaux 38
  • 69. 3 Standing on the shoulders of maintainers May 31th: pip broken https://github.com/pypa/ setuptools/pull/1043 Left-pad: How left-padding strings broke the Internet A Javascript package for left padding strings was removed from node’s package manager, breaking all the websites that depended on it. G Varoquaux 39
  • 70. 3 Dependencies Beyond installation, a challenge is to ensure package versions play way together: correctness of the code Breakage of backward compability yields irreconcilable dependencies G Varoquaux 40
  • 71. 3 Dependencies and their upgrade It’s a fact: users hate upgrading If it ain’t broken, don’t fix it even if it is, apparently G Varoquaux 41
  • 72. 3 Declaring undependence? Monolythic packages with no dependencies... But: Scaling is hard Complexity grows as square of codebase size [Woodfield 1979] User support grows with userbase size G Varoquaux 42
  • 73. 3 Core software is infrastructure Everybody uses it everyday In industry, education, & research G Varoquaux 43
  • 74. 3 Core software is infrastructure Everybody uses it everyday In industry, education, & research It needs maintenance Like roads (or openSSL, to prevent heartbleed) Central infrastructure packages are “boring” They are understaffed and underfunded References: “Roads and Bridge” Ford foundation report Excellent talk by Heather Miller https://www.youtube.com/watch?v=17yy5BwIiTw G Varoquaux 43
  • 75. @GaelVaroquaux Coding for science and innovation New science High value of bringing new methods to a field ⇒ Enable domain-specialists Rapid interation, but with automation & consolidation Software tools Scientists are limited by cognitive load ⇒ Design of API and documentation in libraries Libraries make science reproducible and reusable An ecosystem Central packages hold the ecosystem together Thanks to: the scipy community