To be successful as a data science team, we need to continuously deliver data-driven insights and data products that generate business value. Identifying the best opportunities and building solutions that actually get used in production requires very close collaboration with business users and subject matter experts. What can we learn from agile software development methodologies, and how can we apply them to data science projects?
3. What is Data Science?
• Data science, also known as data-driven
science, is an interdisciplinary field about
scientific methods, processes and systems
to extract knowledge or insights from data
in various forms, either structured or
unstructured – Wikipedia
4. Business Goals
Why do companies hire data scientists?
• Reduce costs
• Increase revenue
• Reduce risk
• Create innovation
5. Deliverables
How do data scientists deliver?
• Actionable insights (reports)
• Data products
• New product features
• Trials, A/B Testing
6. Challenges
Why do many data science projects fail?
• Lack of Business Understanding
• Data Access (Security, Privacy)
• Deployment and Operation (Scalability,
Acceptance)
• Time to market (Competition, Budget)
7. Case Study: Data Science for Sales Department
I want a
recommender
system for my
Sales Reps
Sure, we can use
Alternating Least
Square Singular
Value
Decomposition!
8. Case Study: Data Science for Sales Department
Show me what you
can do with Deep
Learning
Cool, we can do
something with
Tensorflow on
your data
9. Case Study: Data Science for Sales Department
I want a
dashboard of
sales by country
and product
Well, we can do
visualizations - but
that‘s actually not my
job!
10. Typical pitfalls during project execution
Modeling
Trial/Pilot
Operationalization
No access to data
Model does
not scale
Users don‘t
accept solution
Fails to meet business objective
Not enough signal
12 months
Out of budget
12. Agile Data Science
How can we implement CRISP-DM in practice?
• Agile Product Management
• Agile Development
• Data Science Platform / Data Lake
13. Agile Product Management – The Product Vision Statement1
13
Close deals
Prioritize leads
Prevent churn
Acquire new leads
Up-sell
Cross-sell
Sales Reps
Sales Manager
Target Group Needs Product Business Goals
Increase
conversion rate
Increase average
basket size
Reduce churn rate
Grow customer base
„Leverage data science to increase sales team productivity“
?
1Roman Pichler: Agile Product Management with Scrum
14. User Stories – Briding the gap between
algorithms and business needs
Association Rules:
As a sales rep, I need to understand which products are often bought together, so that I
can recommend additional products during sales calls and increase upsale.
Churn Factor Analysis:
As a sales rep, I need to understand the factors that drive churn so that I can select
customers to call, make sure they are satisfied with our products and reduce churn.
Recommender system:
As a sales rep, for each customer I need to understand which products were bought by
customers with similar purchase history, so that I can make personalized
recommendations and increase upsale.
15. Story Mapping and Release Planning
Up/Cross-Selling Churn Prevention Leads Prioritization
User
Interface/Deployment
Association Rules Factor Analysis
Conversion - Factor
Analysis
Item-Item
Recommender
Viz: Top N Items per
customer
A/B Testing
Simple Predictive
Model for Churn
(sales history data)
Improved predictive
model for churn
(incl. CRM data)
Content-based
recommender for cold-
start (incl. CRM data)
Release 1
Release 2
Release 3
A/B Testing
Viz: Top N customer to
likely to churn
17. Data Lake/
Agile Platform
CRM Purchase Data Call Center Tickets
Platform Layer
Application
Layer
Docker/VMs
App
Security/Auth
Auditing
Monitoring
Unstructured Data Structured Data
Scalable Job Execution / Query Engine
App REST
ETL
Query Interface
/Notebooks
Visualization Tools
Scheduling
Legacy
Systems
Business Users
Analysts/
Data Scientists
18. Summary / Call for Action
• Data science projects rarely fail because of insufficient modeling skills
• Focus on business value, deliver „good enough“ models first
• Deliver in small increments that already provide value end-to-end, present
in Sprint Reviews to all stakeholders
• Manage stakeholers using a clear product vision, a user story backlog and
release plans
• Deploy as early as possible to ensure user acceptance, declare as „beta“
mode
• Build an infrastructure that enables agile development