Building Data Teams:data scientists, engineers, and product managers working together to create innovative data products by Anu Tewary Director Of Product Management at Intuit.
Scale your database traffic with Read & Write split using MySQL Router
Playing Nice in the Product Playground
1. 2015
Playing Nice in the
Product Playground
Building Data Teams:
data scientists, engineers, and product managers
working together to create innovative data products
Anu Tewary
October 15, 2015
#GHC15
2015 Anu Tewary
@anutewary
3. 2015
product vision
business impact
success measures
effective architecture
scalability & robustness
metrics & monitoring
not sure what this product does, but
look at the 2% lift I can get from this
model...
ooh, ooh, a Dirichlet prior is what
this needs!!
is this good for an ICML or KDD
paper?
[ 1 ]
4. 2015
product vision
business impact
success measures
effective architecture
scalability & robustness
metrics & monitoring
not sure what this product does, but
look at the 2% lift I can get from this
model...
ooh, ooh, a Dirichlet prior is what
this needs!!
is this good for an ICML or KDD
paper?
Data scientists navel gazing in a corner?!
[ 1 ]
5. 2015
product vision
business impact
success measures
rapid experimentation
simple models first
right metrics
let’s write a new streaming
framework for the weekly dashboard!
we’re not meeting our SLAs, let’s
write a faster json parser!
let’s write an optimized distributed
graph database for our data scientist.
[ 2 ]
6. 2015
product vision
business impact
success measures
rapid experimentation
simple models first
right metrics
let’s write a new streaming
framework for the weekly dashboard!
let’s write a faster json parser in
Clojure!
silver bullet: graph database, fp,
lambda arch
[ 2 ]
Engineers reinventing the tech wheel?!
7. 2015
rapid experimentation
simple models first
right metrics
forget A/B testing, my gut tells me
this is the way to go...
revenue impact? Who cares! Build it
anyway!
no time to instrument! Let’s go to
market and we’ll do that later - I’m
sure that the numbers will look good!
[ 3 ]
effective architecture
scalability & robustness
metrics & monitoring
8. 2015
rapid experimentation
simple models first
right metrics
forget A/B testing, my gut tells me
this is the way to go...
revenue impact? Who cares! Build it
anyway!
no time to instrument! Let’s go to
market and we’ll do that later - I’m
sure that the numbers will look good!
[ 3 ]
effective architecture
scalability & robustness
metrics & monitoring
Product in a bubble?!
14. 2015
find the right mix
minimum
prod
ds
eng
good
good
good
target
great
good
good
prod
ds
eng great
good
good
prod
ds
eng great
good
great
prod
ds
eng
15. 2015
form pods around product
personalization
& reco pod
real time data
capture &
stream proc.
pod
business
search pod
real time
commerce
graph pod
26. 2015
Three Steps to Risa**
3
2
1 awesome team (pods)
solve a big problem (pods)
get out of the way (pods)
** Risa is to Nirvana as Spark is to Hadoop
28. 2015
Example 1: Multinational banking and
financial services company
Took a “technology first” approach: wanted
to build a hadoop cluster, because they had
heard they should
No product vision, but tremendous (!)
possibilities
Not connected closely with business needs
No data science
build an awesome team
solve a big problem
engage
prod
ds
eng
good
tinytiny
29. 2015
Example 2: Large media company
Excellent engineering team
Good product team, but not data driven
Good metrics and beginning data science. Did
not iterate quickly; data and product were
too decoupled
build an awesome team
solve a big problem
engage
?
prod
ds
engamazing
tiny
good
30. 2015
Example 3: Large advertising firm
Data-driven product team, but limited vision
Engineering team not product focused. Could
not iterate quickly
Non-existent data science
build an awesome team
solve a big problem
engage
good
tiny
ok
prod
ds
eng
31. 2015
Example 4: Attempt at Introspection
An awesome team with data, product and
engineering working together
Solving hard problems – for individuals and
small businesses
Need to do a lot more work to get the right
metrics in place – need more work to be
100% eyes on, hands off.
build an awesome team
solve a big problem
engage
34. 2015
Got Feedback?
Rate and review the session on our mobile app
Download at http://ddut.ch/ghc15
or search GHC 2015 in the app store
Hinweis der Redaktion
Give broad examples for companies worked for:
Professional networking
Music streaming
and world’s largest hedge fund
among others
Lucian:
Data scientists navel gazing in a corner?
How many of you have seen this type of situation before in a company? Not necessarily your company, of course :D
Interesting - roughly xx%.
Lucian
1) Let’s write a new streaming framework for the weekly dashboard!
2) we’re not meeting our SLAs, let’s write a faster json parser! I’m sure that’s the bottleneck.
3) Let’s write an optimized distributed graph database for our data scientist. It may come in handy.
Product
what’s the big product vision | product vision: check
what’s the business impact | biz impact: check
how do we measure success | how to measure success: check
Engineering
blah
blah
blah
Data Science
1. I wonder if Dirichlet .... paper publishing to KDD
Product
what’s the big product vision | product vision: check
what’s the business impact | biz impact: check
how do we measure success | how to measure success: check
Engineering
blah
blah
blah
Data Science
1. I wonder if Dirichlet .... paper publishing to KDD
The fact that many of us have experienced at least some of these issues shows that it’s not easy to have cross-functional teams working together. In this talk, I’ll share what has worked and what hasn’t when it comes to building data teams that play nicely together.
I’ve been working in the data space for several years, including time as a senior data scientist at LinkedIn, building a data science startup called Level Up Analytics, which was acquired by Intuit, and now working at Intuit with the incredible data that we have across all of our products.
During my time at Level Up Analytics, we worked with two dozen companies and consulted with over a hundred different companies. Our customers included
A large professional network, a popular music streaming site, and
and world’s largest hedge fund, among others
So, my co-founders and I saw many different companies and groups approach data products and data teams, and we started to identify patterns. And today, I’d like to share these patterns and observations on how data science, product, and engineering can work together to build successful data products.
Three step framework based on lessons-learned in building data teams based on what we learned while building the team at Level Up Analytics and working with all of our clients , and my past two years at Intuit.
The first step we need to take is to build an awesome team
It should be no surprise that the most important step is building a great team. And by team, I mean the joint team, not the individual product, engineering, and data science groups.
What qualities do we look for in our team?
Great communication skills
Nice to work with – no a$$hole policy
Amazing technically.
It’s not so much about: do they know a specific technology or not? It’s more about solid problem solving, critical thinking, strong fundamentals, aptitude and interest in learning.
We wouldn’t drop a candidate from consideration simply because they don’t know Spark, Kafka, or Pandas, or R.
We solve this in part by how we interview—we always have representation from each function during the interview process, so we make sure that data scientists interview the PM candidates, PM’s are part of each engineering interview, and so on. And, we make sure that we have gender diversity for each set of interviews—not only do we have multiple women interview the female candidates, we also have multiple women interview the male candidates.
Never settle!
This is something we strongly believe in! In big companies, there’s often pressure to hire fast, before the req disappears, but it is so important to wait for the right candidate.
Even if it takes 6months to find that one amazing person, it’s worth the wait, instead of hiring 10 reasonably good people.
Never settle!
Another aspect of building an awesome team we found is to find people who at a minimum have a solid understanding of product, engineering, and data science. But we are not shooting for minimum here! Imagine being able to work with:
product managers who are also good technically. For example many product managers we’ve hired have a engineering background
Engineers who understand product well and who know what it means to build predictive models.
Data scientists who understand product and who you’d hire as junior engineers any day
“Pink unicorns”, you say? Not quite and it’s worth the effort.
To some this doesn’t make sense: why not hire people who are very good at one thing and one thing only: “the javascript guy”, “the stream processing gal”, and “the research scientist” rebranded data scientist? It will become apparent later in the talk.
Ok, so now the fun begins.
Rather than assigning projects and products to be built to pre-existing teams, we form small, on-demand teams that we call PODS.
The idea is to pick the right people for the job, regardless of what job function they are part of – for example: 1 technical PM, 3-4 engineers, 1-2 data scientists.
Here are a few project examples and corresponding pods: “business search”, “real time commerce graph” (about 25% of the US GDP flows through Intuit), “stream processing” “stream processing” and so on.
So at this point: we have the right people, with the right skills, on the right pod for the project
encourage them to blur the boundaries ... This works *only* if you have the right people in place
PODS: since everybody in the pod has a minimal set of skills in each area, gives flexibility to take on tasks to move proj forward.
Everybody is there through the whole process: talk to customers, design, brainstorm, build solution, instrument, deploy, analyze…
Example: DS working on integration tests and product features
Example: Eng and Prod guys helping with devops
Meaningful,
People will want to go off and do problems: engr, prod, ds … all.
Don’t solve problems solved somewhere else for the sake of doing it.
Solve the right problems that you’re uniquely positioned to solve instead of spending a year make incremental progress OR even worse reinventing the wheel.
Need time to re-assess. Frequently reassess.
keep metrics, keep score, go
the right metrics (non vanity metrics)
PODS: have your big vision, figure out the right metrics, and empower the PODS to keep score to make sure they are heading in the right direction
iterate quickly and be willing to change
PODS, AGAIN – epoch…, as you frequently reassess, be open and willing to stop, amplilfy, scrap … projects
Concept: Puzzle pieces relaxing, having fun -- alien planet Raisa (as mentioned in Star Trek)
trust them to learn more about the problem than you; become the experts, grow, learn.
PODS: since you are working on the same problem as the rest of the pod (w/ different skills), you’re getting better at that.
Cross pollination:
--Graph expert applies knowledge to fraud detection.
--Push-button deployment from project to project
Grow your own experts is ok
Any member of the team can give a demo or speak to the customers--no one should be such that you can only use them in the “back office”
PODS: demos, client, troubleshoot … not front/back office. “back office analytics”
eyeson handsoff
remove roadblocks, evangelize, advocate for them
PR, investor, advisor, coach
not: simply “boss”
Example: attend weekly informal demo, provide feedback and guidance, remove roadblocks, help build connections
Emphasize:
Key point This works *only* if you’ve done “build a great…” and “solve a big pr ..” right
team uses the metrics to measure progress
finds the next problems to solve on their own
3: get out of the way
PODS: w/ the right team in place, the right metrics, and rapid iteration, empower the team and get out of the way…
Example (if we come up with good one)
To re-cap, how do we get to Risa?
Build an awesome team
Identify and solve a big problem, and
Empower the team by getting out of the way.
And the way we make it all work in practice is via PODS.
… We’re going to share some high-level examples of patterns we’ve noticed and rank them according to the
“so what it ended up happening was: they weren’t building a good team
“Had all of this amazing data they were trying to bring together to solve amazing things, but …”
And because of not having a good team and not having product vision, they didn’t even know where to begin…
The *engineering* team they had was awesome, no data science.
Because not having all 3 facets together, and because they didn’t have the ability to rapidly iterate they were solving the wrong problem.
But they were “on it” ??
Additional:
-- Lucian: Build awesome team “and having fun while doing it”
-- Anu: uniquely positioned to solve for our users; continually reassess ourselves to make sure we’re solving the right problem.
--
And this is how the pieces fit together! Thank you!
[…]
We have time for a few questions.
This is the last slide and must be included in the slide deck