SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Future of Data Science
as a profession
Jose Quesada, Director, Data Science Retreat
@datascienceret
http://datascienceretreat.com/
The promise
The machine learning promise
People should be able to predict:
• Which employee will leave in the next 6 months
• Which electric generator is likely to die in the next 2 weeks
• Which sales lead has the highest potential to close in the next 3
months
• What each new website visitor is likely to buy based on past visitors
http://www.slideshare.net/bigml/the-past-present-and-future-of-machine-learning-apis
Jao. The Past, Present, and Future of Machine Learning APIs
http://www.enlitic.com/healthcare.html
Smile detection
Example Graduate portfolio project from DSR
03. Smile detection on video streams. Works
reliably with multiple people on cam.
Applications: youtube funny video evaluation
Data analysis has become super easy.
But has it?
• Great libraries exist with every algorithm under the sun
The machine learning promise
(Anyone who can turn on a computer) should be able
to predict:
• Which employee will leave in the next 6 months
• Which electric generator is likely to die in the next 2 weeks
• Which sales lead has the highest potential to close in the next 3 months
• What each new website visitor is likely to buy based on past visitors
Paco Nathan: Data Science in future tense
Why data analysis is still
hard, after all the libraries
and APIs
Andreas Mueller’s map
Trent McConaghy’s riff on Andy
http://trent.st/ffx/
Two machine learners, two maps
Andreas Mueller, PhD
Andy is an Assistant Research Scientist
at the NYU Center for Data Science,
building a group to work on open
source software for data science.
Previously I was a Machine Learning
Scientist at Amazon, working on
computer vision and forecasting
problems. I am one of the core
developers of the scikit-learn machine
learning library, and have maintained
it for several years.
Authored the now famous model
picker image from scikit-learn
Trent McConaghy, PhD
Trent is co-founder & CTO of ascribe,
which uses modern crypto, ML, and
big data to tackle challenges in digital
property ownership. His two startups
applied ML in the enterprise semi-
conductor space: ADA was acquired in
2004 and Solido is going strong. His
interests include large scale
regression, automating creativity,
anything labeled "impossible", and
thousand-fold improvements. He was
raised on a pig farm in Canada.
Why data analysis is still hard, after
all the libraries and APIs
• It’s too easy to lie to yourself about it working
• It’s very hard to tell whether it could work if it doesn’t
• There is no free lunch
http://blog.mikiobraun.de/2014/02/data-analysis-hard-
parts.html
No free lunch theorem
• There is no universally optimal learning algorithm as
shown by the No Free Lunch Theorem: There is no
algorithm which is better than all the rest for all kinds
of data.
“Toolified”
• As more and more ML techniques become "toolified" the
problem is that the business doesn't understand that the
hard work is still ahead of them.
• Home Depot sells hammers and lumber, and while some
people have the skill and dedication to build their own
house, most folks are smart enough to hire someone that
knows what they're doing so the thing doesn't fall in and kill
their family.
• Blind faith in the power of tools is not helpful
80 % data mangling 20 % building & testing
models
Is model building automatable?
How about the data Wrangling part? It’s actually a larger chunk
Automating the data
scientist
Machine learning APIs
Machine learning for data Wrangling
• Zoubin Ghahramani, Automatic statistician
• It's easy to shoot yourself in the foot with automated
tools — and convince yourself that the results are
meaningful when they're not
Alternative:
interfaces that draw
the most useful
information out of
people
Aka ‘The Luis von Ahn trick’.
Human computation: combine
human brainpower with computers
to solve problems that neither could
solve alone.
ReCAPTCHA: Computer-generated
tests that humans are routinely able
to pass but that computers have not
yet mastered.
Actionable advice for
individuals
Goal
• Become a full-stack problem solver
• AKA the unicorn data scientist
How to get there
• Focus on delivering business value
How to get there
Only after the business side is covered: focus on the tech
stack.
• Machine learning
• Big data/ engineering
• When to use ML at scale, when to sample and run on a single
machine
Constant learning
• The field changes faster than any other in technology
• If you are not willing to allocate ‘time outside work’ to
learn new things you will stagnate fast
Not being the equivalent to a code
monkey
• MOOC haven decreased the barrier of entry to machine-
learning.
• Nowadays, you cannot be ‘the guy who knows how to
run (insert off-the-shelf-algo-here)’. In dataland, that’s
the equivalent to being a code monkey. MOOCs and
superb libraries (scikit-learn, R’s ecosystem) made sure
there is plenty of people who can throw say a random
forest to a problem. In the modern world, this is not
adding that much value.
Picking problems to add the most
value
• Sometimes beating what the company is already doing
(often, nothing) offers a lot of value. Detecting fraud
poorly is better than not detecting fraud
Data Science will continue to be
democratized
• There’s no shortage of data
scientists.
• 1900: Number of cars on the
road would be limited by the
supply of trained chauffeurs.
Machine learning can very quickly get
you, say, 80% of the way to solving just
about any (real world) problem
You want to apply ML to contexts that are fault tolerant:
• Online ad targeting
• Ranking search results
• Recommendations
• Spam filtering
ML quickly hits a point of
diminishing returns
“The gain is not worth the pain."
Actionable advice for
companies
Talent: invest in it
• The hunt for the 10x programmer continues (although
few companies succeed)
• In data science, the equivalent is the unicorn data
scientist
• Unicorn data scientist should generate more business
value than a 10x programmer
• Market agrees: supersalaries of >200k are common for
unicorn data scientists
Talent: beware of the fake data
scientist
• Each linkedin job ad for data scientist gets ~150
applications
• Often people who just rebranded themselves but have no
real experience
• Very common in guys bailing out of academia
• HR managers cannot tell the difference
• It’s a common mistake to hire one, and never be able to
produce business value
Talent: easier to find than you may
think
• Online courses have raised the bar
• Intensive bootcamps do work, as long as people have
built something at the end
• You will still get 150 fake data scientist for each decent
one
A future where ML has
been popular for years.
How does it look like?
Next 3 years
• ML APIs will enable people with less and less skill to run
quite sophisticated analyses
• Startups doing ML as a service will grow up, then
contract. ML will stop being a key competitive
advantage on most (not all) domains
• Blind faith in the power of tools will lead to wrong
decisions, which will lead to a backslash
Next 10 years
• Prediction: C-level people will be data scientists in the
future
• Product managers become a data scientist, or get
replaced by one
DS is a chaotic field and
people don’t really know
what they want (much less
what they need)
Interested in Data Science Retreat?
Apply to any of our two tracks
http://datascienceretreat.com/
Thank You!
Jose Quesada, PhD
Director, Data Science Retreat
@datascienceret
me@josequesada.com
References
• Paco Nathan. Data science in future tense
• Chris Dixon Machine learning is really good at partially
solving just about any problem
• Jao. The Past, Present, and Future of Machine Learning
APIs

Weitere ähnliche Inhalte

Was ist angesagt?

How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6Zhihao Lin
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Edureka!
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
Implementing Data Science
Implementing Data ScienceImplementing Data Science
Implementing Data ScienceNathan Watson
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationChasity Gibson
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

Was ist angesagt? (20)

How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
Implementing Data Science
Implementing Data ScienceImplementing Data Science
Implementing Data Science
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
AskAndy Anything 2016
AskAndy Anything 2016AskAndy Anything 2016
AskAndy Anything 2016
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Andere mochten auch

R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009Jose Quesada
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon IntroJose Quesada
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010Jose Quesada
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websJose Quesada
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future TensePaco Nathan
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on MesosPaco Nathan
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 

Andere mochten auch (20)

R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon Intro
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data webs
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 

Ähnlich wie Future of data science as a profession

EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session Steve Ardire
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesTathagat Varma
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninInside Analysis
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0 Aditya Randika
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Agence du Numérique (AdN)
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Why analytics projects fail
Why analytics projects failWhy analytics projects fail
Why analytics projects failDr. Bülent Dal
 
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?Haluk Demirkan
 
Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Venkatarangan Thirumalai
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCapgemini
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Infrrd
 
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...NUS-ISS
 

Ähnlich wie Future of data science as a profession (20)

EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & Challenges
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Salesforce Einstein: Use Cases and Product Features
Salesforce Einstein: Use Cases and Product FeaturesSalesforce Einstein: Use Cases and Product Features
Salesforce Einstein: Use Cases and Product Features
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Why analytics projects fail
Why analytics projects failWhy analytics projects fail
Why analytics projects fail
 
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
 
Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pub
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?
 
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
 

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Future of data science as a profession

  • 1. Future of Data Science as a profession Jose Quesada, Director, Data Science Retreat @datascienceret http://datascienceretreat.com/
  • 3. The machine learning promise People should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  • 6. Smile detection Example Graduate portfolio project from DSR 03. Smile detection on video streams. Works reliably with multiple people on cam. Applications: youtube funny video evaluation
  • 7. Data analysis has become super easy. But has it? • Great libraries exist with every algorithm under the sun
  • 8. The machine learning promise (Anyone who can turn on a computer) should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  • 9.
  • 10. Paco Nathan: Data Science in future tense
  • 11.
  • 12. Why data analysis is still hard, after all the libraries and APIs
  • 14. Trent McConaghy’s riff on Andy http://trent.st/ffx/
  • 15. Two machine learners, two maps Andreas Mueller, PhD Andy is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously I was a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. I am one of the core developers of the scikit-learn machine learning library, and have maintained it for several years. Authored the now famous model picker image from scikit-learn Trent McConaghy, PhD Trent is co-founder & CTO of ascribe, which uses modern crypto, ML, and big data to tackle challenges in digital property ownership. His two startups applied ML in the enterprise semi- conductor space: ADA was acquired in 2004 and Solido is going strong. His interests include large scale regression, automating creativity, anything labeled "impossible", and thousand-fold improvements. He was raised on a pig farm in Canada.
  • 16. Why data analysis is still hard, after all the libraries and APIs • It’s too easy to lie to yourself about it working • It’s very hard to tell whether it could work if it doesn’t • There is no free lunch http://blog.mikiobraun.de/2014/02/data-analysis-hard- parts.html
  • 17. No free lunch theorem • There is no universally optimal learning algorithm as shown by the No Free Lunch Theorem: There is no algorithm which is better than all the rest for all kinds of data.
  • 18. “Toolified” • As more and more ML techniques become "toolified" the problem is that the business doesn't understand that the hard work is still ahead of them. • Home Depot sells hammers and lumber, and while some people have the skill and dedication to build their own house, most folks are smart enough to hire someone that knows what they're doing so the thing doesn't fall in and kill their family. • Blind faith in the power of tools is not helpful
  • 19. 80 % data mangling 20 % building & testing models Is model building automatable? How about the data Wrangling part? It’s actually a larger chunk
  • 22. Machine learning for data Wrangling
  • 23. • Zoubin Ghahramani, Automatic statistician • It's easy to shoot yourself in the foot with automated tools — and convince yourself that the results are meaningful when they're not
  • 24. Alternative: interfaces that draw the most useful information out of people Aka ‘The Luis von Ahn trick’. Human computation: combine human brainpower with computers to solve problems that neither could solve alone. ReCAPTCHA: Computer-generated tests that humans are routinely able to pass but that computers have not yet mastered.
  • 26. Goal • Become a full-stack problem solver • AKA the unicorn data scientist
  • 27. How to get there • Focus on delivering business value
  • 28. How to get there Only after the business side is covered: focus on the tech stack. • Machine learning • Big data/ engineering • When to use ML at scale, when to sample and run on a single machine
  • 29. Constant learning • The field changes faster than any other in technology • If you are not willing to allocate ‘time outside work’ to learn new things you will stagnate fast
  • 30. Not being the equivalent to a code monkey • MOOC haven decreased the barrier of entry to machine- learning. • Nowadays, you cannot be ‘the guy who knows how to run (insert off-the-shelf-algo-here)’. In dataland, that’s the equivalent to being a code monkey. MOOCs and superb libraries (scikit-learn, R’s ecosystem) made sure there is plenty of people who can throw say a random forest to a problem. In the modern world, this is not adding that much value.
  • 31. Picking problems to add the most value • Sometimes beating what the company is already doing (often, nothing) offers a lot of value. Detecting fraud poorly is better than not detecting fraud
  • 32. Data Science will continue to be democratized • There’s no shortage of data scientists. • 1900: Number of cars on the road would be limited by the supply of trained chauffeurs.
  • 33. Machine learning can very quickly get you, say, 80% of the way to solving just about any (real world) problem You want to apply ML to contexts that are fault tolerant: • Online ad targeting • Ranking search results • Recommendations • Spam filtering
  • 34. ML quickly hits a point of diminishing returns “The gain is not worth the pain."
  • 36. Talent: invest in it • The hunt for the 10x programmer continues (although few companies succeed) • In data science, the equivalent is the unicorn data scientist • Unicorn data scientist should generate more business value than a 10x programmer • Market agrees: supersalaries of >200k are common for unicorn data scientists
  • 37. Talent: beware of the fake data scientist • Each linkedin job ad for data scientist gets ~150 applications • Often people who just rebranded themselves but have no real experience • Very common in guys bailing out of academia • HR managers cannot tell the difference • It’s a common mistake to hire one, and never be able to produce business value
  • 38. Talent: easier to find than you may think • Online courses have raised the bar • Intensive bootcamps do work, as long as people have built something at the end • You will still get 150 fake data scientist for each decent one
  • 39. A future where ML has been popular for years. How does it look like?
  • 40. Next 3 years • ML APIs will enable people with less and less skill to run quite sophisticated analyses • Startups doing ML as a service will grow up, then contract. ML will stop being a key competitive advantage on most (not all) domains • Blind faith in the power of tools will lead to wrong decisions, which will lead to a backslash
  • 41. Next 10 years • Prediction: C-level people will be data scientists in the future • Product managers become a data scientist, or get replaced by one
  • 42. DS is a chaotic field and people don’t really know what they want (much less what they need)
  • 43. Interested in Data Science Retreat? Apply to any of our two tracks http://datascienceretreat.com/
  • 44.
  • 45. Thank You! Jose Quesada, PhD Director, Data Science Retreat @datascienceret me@josequesada.com
  • 46. References • Paco Nathan. Data science in future tense • Chris Dixon Machine learning is really good at partially solving just about any problem • Jao. The Past, Present, and Future of Machine Learning APIs

Hinweis der Redaktion

  1. It was almost a joke Too much email asking the ‘When to do what’ question
  2. IF YOU thought sci-kit learn was convenient 
  3. What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  4. What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  5. The discussion about the shortage of Data Scientists reminds me that in the early 1900s people thought that the number of cars on the road would be limited by the supply of trained chauffeurs. Then Henry Ford and others built cars that owners could drive themselves. New tools are going to be available that business owners can use themselves without need data scientists  
  6. you need to apply ML to contexts that are fault tolerant: online ad targeting, ranking search results, Recommendations spam filtering.