SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Data Science in Future Tense 
! 
GalvanizeU Launch! 
2014-10-29 
gulaunch.splashthat.com 
! 
Paco Nathan 
@pacoid
Whither Data Science?
Whither Data Science? 
twitter.com/josh_wills/status/198093512149958656
Whither Data Science? 
FLAWED 
twitter.com/josh_wills/status/198093512149958656 
issue: aristotelian perspectives in a non-linear world…
Whither Data Science? 
circa 2008: a large ad-tech firm, running one 
of the largest Hadoop instances in the cloud, 
execs said “Don’t bother” to dig into analysis 
of geo, clustering, time series, etc.
Whither Data Science? 
circa 2008: a large ad-tech firm, running one 
of the largest Hadoop instances in the cloud, 
execs said “Don’t bother” to dig into analysis 
of geo, clustering, time series, etc. 
! 
We did anyway.
Whither Data Science? 
circa 2008: a large of the largest Hadoop FLAWED 
ad-tech firm, running one 
instances in the cloud, 
execs said “Don’t bother” to dig into analysis 
of geo, clustering, time series, etc. 
! 
We did anyway: 
• people in SF don’t click online travel ads much, 
however, people in Dodge City do… a lot! 
• largest customer segment: flag poles, portable 
generators, hammocks, sea salt, mail-order 
steaks, defibrillators
Whither Data Science? 
primary sources for the notion: 
Cleveland, W. S., 
“Data Science: an Action Plan for Expanding 
the Technical Areas of the Field of Statistics,” 
International Statistical Review (2001), 69, 21-26. 
http://cm.bell-labs.com/stat/doc/datascience.ps 
Breiman L., 
“Statistical modeling: the two cultures”, 
Statistical Science (2001), 16:199-231. 
http://projecteuclid.org/euclid.ss/1009213726 
…also good to mention John Tukey
Whither Data Science? 
we have a long, long way yet to go: 
So many problems that we encounter 
in industry can be represented as graphs… 
! 
Tensors provide means for representing 
multiple-edge graphs, ostensibly solving 
for a general case… 
! 
Even so, how much time have you spent 
working with tensors for data science apps? 
wikipedia.org
Historical Arc 1: 
The Alchemists… 
“Who has the crystal ball?”
Arc 1: Who has the crystal ball? 
TL;DR: Nods to some people who envisaged 
and modeled our shared future…
Arc 1: Who has the crystal ball? 
Theory, Eight Decades Ago: 
what can be computed? 
Haskell Curry 
haskell.org 
Alonso Church 
wikipedia.org 
Praxis, Four Decades Ago: 
algebra for applicative systems 
John Backus 
acm.org 
David Turner 
wikipedia.org 
Reality, Two Decades Ago: 
web apps, ML, machine data 
Pattie Maes 
MIT Media Lab
spark.apache.org 
A Brief History: Functional Programming for Big Data
A Brief History: Smashing The Previous Petabyte Sort Record 
databricks.com/blog/2014/10/10/spark-petabyte-sort.html 
spark.apache.org
Historical Arc 2: 
An Oblivoir Of Origins… 
“Why are we here?”
Arc 2: Why are we here? 
TL;DR: We share the delightful role of… 
! 
! 
speaking truth to power
Arc 2: Why are we here? 
Reason 1: 
early 19th c. Prussian/Napoleonic “General Staff” 
organization => corporate IT silos 
! 
translated: 
too many people saying “That is not my concern.” 
! 
action: 
interdisciplinary teams tear down silos, 
surfacing insights
Arc 2: Why are we here? 
Reason 2: 
19th-20th c. statistics emphasized defensibility 
in lieu of predictability 
! 
translated: 
defend one’s job, not boost top-line revenue 
! 
action: 
focus on predictability; if you need to defend 
your job, you should be working elsewhere
Arc 2: Why are we here? 
Reason 3: 
machine learning derives from several disciplines, 
but ultimately is a subset of optimization 
! 
translated: 
they couldn’t talk to each other very much, 
we have difficulty understanding them collectively 
! 
action: 
learn to leverage optimization theory, thoroughly
Arc 2: Why are we here? 
Reason 4: 
university math curricula are still tilted toward 
Cold War priorities 
! 
translated: 
2-3 years calculus weeds out the better mechanical 
engineering candidates who can build the most 
cost-effective ICBMs 
! 
action: 
leadership must embrace how to leverage advanced 
math for business use cases
Arc 2: Why are we here? 
Reason 5: 
brogrammers tend to emphasize logical 
reasoning over analytic reasoning 
! 
translated: 
left-brained lopsidedness wins temporarily, 
then fails spectacularly 
! 
action: 
ask security to walk the brogrammers 
back to their cave
Arc 2: Why are we here? 
Reason 6: 
people can make intuitive decisions in 
~4 dimensions at most, period 
! 
translated: 
product managers as Steve Jobs wannabes 
are poisonous 
! 
action: 
leverage data science, visualization, machine learning 
with distributed systems at scale to address the high 
dimensionality of data
Arc 2: Why are we here? 
Reason 7: 
embracing perpetual learning curves represents 
a promethean challenge 
! 
translated: 
learning is hard, and many organizations go to 
great lengths to minimize it 
! 
action: 
learn efficiently, continually, with a great thirst
Historical Arc 3: 
Be There Then… 
“What happens next?”
Arc 4: What happens next? 
TL;DR: Brace yourselves…
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
Full stack… no, really 
from visualization 
to virtualization, 
all points in-between 
source: Microsoft
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
You’ll work with functional programming 
and cloud-based notebooks 
http://databricks.com/product
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
Shift from modeling based on variance (batch) 
towards probabilistic approximation 
highlyscalable.wordpress.com/2012/05/01/ 
probabilistic-structures-web-analytics-data- 
mining/
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
Early data scientists displace the old-school 
product managers
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
IoT, drones, microsats: several orders of magnitude 
more data up ahead 
microsats 
e.g., Planet Labs, 400 km 
airships 
e.g., JP Aerospace, 40 km 
atmostats 
e.g., Titan Aerospace, 20 km 
drones 
e.g., HoneyComb, 120 m 
robots 
e.g., Blue River, 1 m sensors 
e.g., Hortau, -0.3 m 
Layered Sensing Networks
Arc 4: What happens next? 
• Full stack… no, really 
• You’ll work with functional programming 
and cloud-based notebooks 
• Shift from modeling based on variance (batch) 
towards probabilistic approximation 
• Early data scientists displace the old-school 
product managers 
• IoT, drones, microsats: several orders of 
magnitude more data up ahead 
• leave SF – the more interesting data science 
work to be accomplished is not here
Arc 4: What happens next? 
leave SF – the more interesting data science 
work to be accomplished is not here
Summary?
Vector Quantization: 
After we’ve cleaned up data, formulated workflows 
in terms of monoids, used graph representation, and 
parallelized with a wealth of linear algebra, much of 
the heavy-lifting that remains on the clusters is in 
optimization 
For example, deep learning @Google 
uses many layers of neural nets trained 
with gradient descent optimization 
Taming Latency Variability and Scaling Deep Learning 
Jeff Dean @Google (2013) 
youtu.be/S9twUcX1Zp0
Vector Quantization: 
One advantage of quantum algorithms is 
to run large gradient descent problems in 
constant time… Reworking high-ROI apps 
to leverage lots of ML and large clusters, 
then SGD represents the datacenter cost 
basis, notably that part that scales… 
Want to slash costs exponentially? 
Plug in quantum for a game-changer, 
maybe 
Fast quantum algorithm for 
numerical gradient estimation 
Stephen P. Jordan 
Phys. Rev. Lett. 95, 050501 (2005) 
arxiv.org/abs/quant-ph/0405146 dwavesys.com
Vector Quantization: 
Proposal: let’s drop clusters of quantum 
devices into lunar polar craters, so we 
can handle massive vector quantization 
workloads 
• micro-kelvin environs 
• near perpetual sunlight 
for energy sources 
• park routers at L4 
• approx. $15B to finance, 
i.e., ~6 days DoD budget
Vector Quantization: 
We’ll just put this here… 
a couple o’ Googly projects in progress: 
qCraft: Quantum Physics In Minecraft 
plus.google.com/u/ 
1/+QuantumAILab/posts/ 
grMbaaDGChH 
lunar.xprize.org 
“We’re going back to the Moon. For good.”
Resources
Apache Spark community: 
• spark.apache.org/community.html 
• databricks.com/spark-training 
• oreilly.com/go/sparkcert
events: 
Strata EU 
Barcelona, Nov 19-21 
strataconf.com/strataeu2014 
Data Day Texas 
Austin, Jan 10 
datadaytexas.com 
Strata CA 
San Jose, Feb 18-20 
strataconf.com/strata2015 
Spark Summit East 
NYC, Mar 18-19 
spark-summit.org/east 
Spark Summit 2015 
SF, Jun 15-17 
spark-summit.org
presenter: 
monthly newsletter for updates, 
events, conf summaries, etc.: 
liber118.com/pxn/ 
Just Enough Math 
O’Reilly, 2014 
justenoughmath.com 
preview: youtu.be/TQ58cWgdCpA 
Enterprise Data Workflows 
with Cascading 
O’Reilly, 2013 
shop.oreilly.com/product/ 
0636920028536.do

Weitere ähnliche Inhalte

Was ist angesagt?

Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01Krishna Sankar
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosPaco Nathan
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks DataWorks Summit/Hadoop Summit
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Andy Petrella
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2Oodsc
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 

Was ist angesagt? (20)

Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 

Andere mochten auch

#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on MesosPaco Nathan
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?Paco Nathan
 

Andere mochten auch (11)

#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 

Ähnlich wie Data Science in Future Tense

(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Chaos engineering open science for software engineering - kube con north am...
Chaos engineering   open science for software engineering - kube con north am...Chaos engineering   open science for software engineering - kube con north am...
Chaos engineering open science for software engineering - kube con north am...Sylvain Hellegouarch
 

Ähnlich wie Data Science in Future Tense (20)

Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Chaos engineering open science for software engineering - kube con north am...
Chaos engineering   open science for software engineering - kube con north am...Chaos engineering   open science for software engineering - kube con north am...
Chaos engineering open science for software engineering - kube con north am...
 
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
 

Mehr von Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEPaco Nathan
 

Mehr von Paco Nathan (8)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Data Science in Future Tense

  • 1. Data Science in Future Tense ! GalvanizeU Launch! 2014-10-29 gulaunch.splashthat.com ! Paco Nathan @pacoid
  • 3. Whither Data Science? twitter.com/josh_wills/status/198093512149958656
  • 4. Whither Data Science? FLAWED twitter.com/josh_wills/status/198093512149958656 issue: aristotelian perspectives in a non-linear world…
  • 5. Whither Data Science? circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc.
  • 6. Whither Data Science? circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. ! We did anyway.
  • 7. Whither Data Science? circa 2008: a large of the largest Hadoop FLAWED ad-tech firm, running one instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. ! We did anyway: • people in SF don’t click online travel ads much, however, people in Dodge City do… a lot! • largest customer segment: flag poles, portable generators, hammocks, sea salt, mail-order steaks, defibrillators
  • 8. Whither Data Science? primary sources for the notion: Cleveland, W. S., “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review (2001), 69, 21-26. http://cm.bell-labs.com/stat/doc/datascience.ps Breiman L., “Statistical modeling: the two cultures”, Statistical Science (2001), 16:199-231. http://projecteuclid.org/euclid.ss/1009213726 …also good to mention John Tukey
  • 9. Whither Data Science? we have a long, long way yet to go: So many problems that we encounter in industry can be represented as graphs… ! Tensors provide means for representing multiple-edge graphs, ostensibly solving for a general case… ! Even so, how much time have you spent working with tensors for data science apps? wikipedia.org
  • 10. Historical Arc 1: The Alchemists… “Who has the crystal ball?”
  • 11. Arc 1: Who has the crystal ball? TL;DR: Nods to some people who envisaged and modeled our shared future…
  • 12. Arc 1: Who has the crystal ball? Theory, Eight Decades Ago: what can be computed? Haskell Curry haskell.org Alonso Church wikipedia.org Praxis, Four Decades Ago: algebra for applicative systems John Backus acm.org David Turner wikipedia.org Reality, Two Decades Ago: web apps, ML, machine data Pattie Maes MIT Media Lab
  • 13. spark.apache.org A Brief History: Functional Programming for Big Data
  • 14. A Brief History: Smashing The Previous Petabyte Sort Record databricks.com/blog/2014/10/10/spark-petabyte-sort.html spark.apache.org
  • 15. Historical Arc 2: An Oblivoir Of Origins… “Why are we here?”
  • 16. Arc 2: Why are we here? TL;DR: We share the delightful role of… ! ! speaking truth to power
  • 17. Arc 2: Why are we here? Reason 1: early 19th c. Prussian/Napoleonic “General Staff” organization => corporate IT silos ! translated: too many people saying “That is not my concern.” ! action: interdisciplinary teams tear down silos, surfacing insights
  • 18. Arc 2: Why are we here? Reason 2: 19th-20th c. statistics emphasized defensibility in lieu of predictability ! translated: defend one’s job, not boost top-line revenue ! action: focus on predictability; if you need to defend your job, you should be working elsewhere
  • 19. Arc 2: Why are we here? Reason 3: machine learning derives from several disciplines, but ultimately is a subset of optimization ! translated: they couldn’t talk to each other very much, we have difficulty understanding them collectively ! action: learn to leverage optimization theory, thoroughly
  • 20. Arc 2: Why are we here? Reason 4: university math curricula are still tilted toward Cold War priorities ! translated: 2-3 years calculus weeds out the better mechanical engineering candidates who can build the most cost-effective ICBMs ! action: leadership must embrace how to leverage advanced math for business use cases
  • 21. Arc 2: Why are we here? Reason 5: brogrammers tend to emphasize logical reasoning over analytic reasoning ! translated: left-brained lopsidedness wins temporarily, then fails spectacularly ! action: ask security to walk the brogrammers back to their cave
  • 22. Arc 2: Why are we here? Reason 6: people can make intuitive decisions in ~4 dimensions at most, period ! translated: product managers as Steve Jobs wannabes are poisonous ! action: leverage data science, visualization, machine learning with distributed systems at scale to address the high dimensionality of data
  • 23. Arc 2: Why are we here? Reason 7: embracing perpetual learning curves represents a promethean challenge ! translated: learning is hard, and many organizations go to great lengths to minimize it ! action: learn efficiently, continually, with a great thirst
  • 24. Historical Arc 3: Be There Then… “What happens next?”
  • 25. Arc 4: What happens next? TL;DR: Brace yourselves…
  • 26. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 27. Arc 4: What happens next? Full stack… no, really from visualization to virtualization, all points in-between source: Microsoft
  • 28. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 29. Arc 4: What happens next? You’ll work with functional programming and cloud-based notebooks http://databricks.com/product
  • 30. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 31. Arc 4: What happens next? Shift from modeling based on variance (batch) towards probabilistic approximation highlyscalable.wordpress.com/2012/05/01/ probabilistic-structures-web-analytics-data- mining/
  • 32. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 33. Arc 4: What happens next? Early data scientists displace the old-school product managers
  • 34. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 35. Arc 4: What happens next? IoT, drones, microsats: several orders of magnitude more data up ahead microsats e.g., Planet Labs, 400 km airships e.g., JP Aerospace, 40 km atmostats e.g., Titan Aerospace, 20 km drones e.g., HoneyComb, 120 m robots e.g., Blue River, 1 m sensors e.g., Hortau, -0.3 m Layered Sensing Networks
  • 36. Arc 4: What happens next? • Full stack… no, really • You’ll work with functional programming and cloud-based notebooks • Shift from modeling based on variance (batch) towards probabilistic approximation • Early data scientists displace the old-school product managers • IoT, drones, microsats: several orders of magnitude more data up ahead • leave SF – the more interesting data science work to be accomplished is not here
  • 37. Arc 4: What happens next? leave SF – the more interesting data science work to be accomplished is not here
  • 39. Vector Quantization: After we’ve cleaned up data, formulated workflows in terms of monoids, used graph representation, and parallelized with a wealth of linear algebra, much of the heavy-lifting that remains on the clusters is in optimization For example, deep learning @Google uses many layers of neural nets trained with gradient descent optimization Taming Latency Variability and Scaling Deep Learning Jeff Dean @Google (2013) youtu.be/S9twUcX1Zp0
  • 40. Vector Quantization: One advantage of quantum algorithms is to run large gradient descent problems in constant time… Reworking high-ROI apps to leverage lots of ML and large clusters, then SGD represents the datacenter cost basis, notably that part that scales… Want to slash costs exponentially? Plug in quantum for a game-changer, maybe Fast quantum algorithm for numerical gradient estimation Stephen P. Jordan Phys. Rev. Lett. 95, 050501 (2005) arxiv.org/abs/quant-ph/0405146 dwavesys.com
  • 41. Vector Quantization: Proposal: let’s drop clusters of quantum devices into lunar polar craters, so we can handle massive vector quantization workloads • micro-kelvin environs • near perpetual sunlight for energy sources • park routers at L4 • approx. $15B to finance, i.e., ~6 days DoD budget
  • 42. Vector Quantization: We’ll just put this here… a couple o’ Googly projects in progress: qCraft: Quantum Physics In Minecraft plus.google.com/u/ 1/+QuantumAILab/posts/ grMbaaDGChH lunar.xprize.org “We’re going back to the Moon. For good.”
  • 44. Apache Spark community: • spark.apache.org/community.html • databricks.com/spark-training • oreilly.com/go/sparkcert
  • 45. events: Strata EU Barcelona, Nov 19-21 strataconf.com/strataeu2014 Data Day Texas Austin, Jan 10 datadaytexas.com Strata CA San Jose, Feb 18-20 strataconf.com/strata2015 Spark Summit East NYC, Mar 18-19 spark-summit.org/east Spark Summit 2015 SF, Jun 15-17 spark-summit.org
  • 46. presenter: monthly newsletter for updates, events, conf summaries, etc.: liber118.com/pxn/ Just Enough Math O’Reilly, 2014 justenoughmath.com preview: youtu.be/TQ58cWgdCpA Enterprise Data Workflows with Cascading O’Reilly, 2013 shop.oreilly.com/product/ 0636920028536.do