Under Sundar Pichai, Google is doubling down on machine learning and artificial intelligence, and they're not the only ones. The impact of the robot revolution will not be limited to the ranking of search results, and the impacts on the job market are the subject of endless speculation. Will has been researching the parts of our digital marketing jobs that computers can do better than we can. In this talk, he'll explore the boundaries of human and computer capabilities and show you how to combine the strengths of both.
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
SearchLove San Diego 2017 | Will Critchlow | Knowing Ranking Factors Won't Be Enough: How To Avoid Losing Your Job to a Robot
1. Knowing ranking factors won’t
be enough
How to avoid losing your job to a robot
@willcritchlow
2. I’m going to tell you about a robot
that understands ranking factors
better than any of you
...but before I get to that, let’s look at a bit of history...
9. We used to have a pretty good
understanding of ranking factors
10. My mental model for c. 2009 ranking factors
had three different modes:
11. My mental model for ~2009 ranking factors had
three different modes:
One in the
hyper-competitive
head
One in the
competitive
mid-tail
...and
one
in
the
long-tail
13. Tons of perfectly on-topic
pages to choose from
One in the
hyper-competitive
head
14. So pick only perfectly-on-topic pages
One in the
hyper-competitive
head
15. ...and rank by authority (*)
(*) Page authority, but the
domain inevitably factors into
that calculation. This is why
so many homepages ranked
One in the
hyper-competitive
head
16. This resulted in a mix
of homepages of
mid-size sites, and
inner pages on huge
sites
One in the
hyper-competitive
head
17. But the general way
to move up was
through increased
authority
One in the
hyper-competitive
head
18. Kind of
search result
Pages ranking To move up...
Head Homepages of
mid-size sites and
inner pages of
massive sites. All
perfectly-targeted.
Improve authority.
Mid-tail
Long-tail
23. Move up with better
targeting or more
authority
One in the
competitive
mid-tail
24. Kind of
search result
Pages ranking To move up...
Head Homepages of
mid-size sites and
inner pages of
massive sites. All
perfectly-targeted.
Improve authority.
Mid-tail Perfectly on-topic
pages on relatively
weak sites plus
roughly on-topic on
bigger sites.
Improve targeting
or authority.
Long-tail
29. Kind of
search result
Pages ranking To move up...
Head Homepages of
mid-size sites and
inner pages of
massive sites. All
perfectly-targeted.
Improve authority.
Mid-tail Perfectly on-topic
pages on relatively
weak sites plus
roughly on-topic on
bigger sites.
Improve targeting or
authority.
Long-tail Arbitrarily-weak
on-topic pages and
roughly-targeted
deep pages on
massive sites.
Improve targeting.
30. Kind of
search result
Pages ranking To move up...
Head Homepages of
mid-size sites and
inner pages of
massive sites. All
perfectly-targeted.
Improve authority.
Mid-tail Perfectly on-topic
pages on relatively
weak sites plus
roughly on-topic on
bigger sites.
Improve targeting or
authority.
Long-tail Arbitrarily-weak
on-topic pages and
roughly-targeted
deep pages on
massive sites.
Improve targeting.
So that was
~2009
31. It’s not so simple any more.
Google is harder to understand these days.
38. I was thinking about it like it was a
math puzzle and if I just thought
really hard it would all make sense.
-- Kevin Lacker (@lacker)
39. Hey why don't you take the square
root?
-- Amit Singhal according to Kevin Lacker (@lacker)
40. oh... am I allowed to write code that
doesn't make any sense?
-- Kevin Lacker (@lacker)
41. Multiply by 2 if it helps, add 5,
whatever, just make things work
and we can make it make sense
later.
-- Amit Singhal according to Kevin Lacker (@lacker)
42. Why does this make the algorithm so
hard to understand?
47. You might know what any one of
the levers does, but they can
interact with each other in complex
ways
This is what a high-dimensional function looks like
49. We sell custom cigar humidors. Our
custom cigar humidors are handmade. If
you’re thinking of buying a custom cigar
humidor, please contact our custom
cigar humidor specialists at
custom.cigar.humidors@example.com
What this needs is another mention of [cigar humidors]
50. With no mentions of [cigar] or [humidor] this
page would be unlikely to rank
And yet you can clearly go too far, and have the effect turn negative.
This is called nonlinearity.
The cigar example is taken directly from Google’s quality guidelines.
57. No, but I’m still pretty good at this
You’re thinking this to yourself right now.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74. I promised to tell you about a robot
that is better than even
experienced SEOs...
Well. It turns out all we needed was a coin to flip. You’re all fired.
80. John Giannandrea - Google’s head of search
Sundar’s choice to lead search after Amit. Previously running machine learning.
81. ...and of course Jeff Dean is doing Jeff Dean things
(c.f. Chuck Norris)
82. Jeff Dean puts his pants on one leg
at a time, but if he had more legs,
you would see that his approach is
O(log n).
Source: Jeff Dean facts
83. Once, in early 2002, when the
search back-ends went down, Jeff
Dean answered user queries
manually for two hours.
Result quality improved markedly during this time
84. When Jeff Dean goes on vacation,
production services across Google
mysteriously stop working within a
few days.
This was reportedly actually true
85. The original Google Translate was the result
of the work of hundreds of engineers over 10
years.
86. Director of Translate, Macduff Hughes said
that it sounded to him as if maybe they could
pull off a neural-network-based replacement
in three years.
87. Jeff Dean said “we can do it by the end of the
year, if we put our minds to it”.
88. Hughes: “I’m not going to be the one to say
Jeff Dean can’t deliver speed.”
89. A month later, the work of a team of 3
engineers was tested against the existing
system. The improvement was roughly
equivalent to the improvement of the old
system over the previous 10 years.
90. Hughes sent his team an email. All projects
on the old system were to be suspended
immediately.
[Read the whole story ]
97. Computers are better than humans at
classification, but struggle with adversaries
Read more about this here -- Cheetah, Leopard, Jaguar
98. Lesson:
We expect adversarial abilities to
take a step backwards
They will remain good at classifying bad links but will be likely to fall
prey to weird outcomes in adversarial situations
99. Example:
Remember Tay, the Microsoft
chatbot that Twitter taught to be
racist and sexist in less than a day?
Read more here
101. Rules of ML [PDF] outlines engineering lessons
from getting ML into production at Google
102. Example lesson: There will be silent failures
“This is a problem that occurs more for machine learning systems than for other
kinds of systems. Suppose that a particular table that is being joined is no longer
being updated. The machine learning system will adjust, and behavior will
continue to be reasonably good, decaying gradually. Sometimes tables are found
that were months out of date, and a simple refresh improved performance more
than any other launch that quarter! For example, the coverage of a feature may
change due to implementation changes: for example a feature column could be
populated in 90% of the examples, and suddenly drop to 60% of the examples.
Play once had a table that was stale for 6 months, and refreshing the table alone
gave a boost of 2% in install rate. If you track statistics of the data, as well as
manually inspect the data on occassion, you can reduce these kinds of failures.”
103. Example lesson: There will be silent failures
“This is a problem that occurs more for machine learning systems than for other
kinds of systems. Suppose that a particular table that is being joined is no longer
being updated. The machine learning system will adjust, and behavior will
continue to be reasonably good, decaying gradually. Sometimes tables are found
that were months out of date, and a simple refresh improved performance more
than any other launch that quarter! For example, the coverage of a feature may
change due to implementation changes: for example a feature column could be
populated in 90% of the examples, and suddenly drop to 60% of the examples.
Play once had a table that was stale for 6 months, and refreshing the table alone
gave a boost of 2% in install rate. If you track statistics of the data, as well as
manually inspect the data on occassion, you can reduce these kinds of failures.”
104. Example lesson: There will be silent failures
“This is a problem that occurs more for machine learning systems than for other
kinds of systems. Suppose that a particular table that is being joined is no longer
being updated. The machine learning system will adjust, and behavior will
continue to be reasonably good, decaying gradually. Sometimes tables are found
that were months out of date, and a simple refresh improved performance more
than any other launch that quarter! For example, the coverage of a feature may
change due to implementation changes: for example a feature column could be
populated in 90% of the examples, and suddenly drop to 60% of the examples.
Play once had a table that was stale for 6 months, and refreshing the table alone
gave a boost of 2% in install rate. If you track statistics of the data, as well as
manually inspect the data on occassion, you can reduce these kinds of failures.”
105. That document also has a section on trying to
understand what the machines are doing
106. But human explainability may not
even be possible
Not every concept a neural network uses fits neatly into a concept for
which we have a word. It’s not clear this is a weakness per se, but...
107. ...this means that engineers won’t
always know more than we do
about why a page does or doesn’t
rank
The big knowledge gap of the future is data - clickthrough rates,
bounce rates etc.
108. As Tom Capper said, engineers’ statements can
already be misleading
109. ...and remember the confounding split-tests
It’s already not always as simple as “feature X is good”
Which all means we may need to be more independent-minded and do
more of our own research
111. Michael Lewis’ latest book is
about Kahneman and Tversky
spelling.
It recounts a story about a piece
of medical software that existed
in the 1960s.
112. It was designed to encapsulate
how a range of doctors
diagnosed stomach cancer from
x-rays.
113. It proceeded to outperform those
same doctors despite only
containing their expertise.
Real people have biases, and fool
themselves.
Encapsulate your own expert
knowledge.
114. At Distilled, we use a
methodology we call the
balanced digital scorecard.
This encapsulates our beliefs
about how to build a
high-performing business.
Applying it helps avoid our own
biases.
115. Also, while we are talking about
books, The Checklist Manifesto is
an important part of avoiding the
same cognitive biases.
116. Focus on consulting skills
I’ve written a few things about
this (DistilledU module, writing
better business documents, using
split-tests to consult better).
Use case studies and creativity.
Computers are better at
diagnosis than cure.
This means: getting things done,
convincing organizations,
applying general knowledge,
learning new things.
117. We are going to need to be
better than ever at debugging
things.
I wrote about debugging skills for
non-developers here.
A lot of the story of enterprise
consulting is going to be about
figuring out why things have
gone wrong in the face of sparse
or incorrect information from
Google.
118.
119. Disregard expert surveys
Firstly, there are all the problems
outlined in the search result pairs
study - both in the ability of
experts to understand factors,
and in your ability to use the
information even if they do.
Secondly, they are broken with
another bias called the “law of
small numbers” from Lewis’ book.
PS - I say this as a participant in
many of them
Me
120. Equally, building your digital
strategy on what Google tells you
to do will become an even worse
idea than it already is.
121. This is why we have been investing so much in split-testing
Check out www.distilledodn.com if you haven’t already.
The team will be happy to demo for you.
We’re now serving ~1.5 billion requests / month, and recently published
information covering everything from response times to our +£100k /
month split test.
122. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
123. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
124. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
125. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
126. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
b. Becoming great consultants and change agents
127. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
b. Becoming great consultants and change agents
c. Debugging the heck out of everything
128. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
b. Becoming great consultants and change agents
c. Debugging the heck out of everything
d. Avoiding being misled by experts or Google
129. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
b. Becoming great consultants and change agents
c. Debugging the heck out of everything
d. Avoiding being misled by experts or Google
e. Testing!
135. The specifics of DeepRank
Gather and
process
training data
We started with a broad range
of unbranded keywords from
our STAT rank tracking.
For each of the URLs ranking in
the top 10, we gathered key
metrics about the domain and
page - both from direct crawling
and various APIs.
We turned this into a set of pairs
of URLs {A,B} with their
associated keyword, metrics,
and their rank ordering.
136. The specifics of DeepRank
Gather and
process
training data
We started with a broad range
of unbranded keywords from
our STAT rank tracking.
For each of the URLs ranking in
the top 10, we gathered key
metrics about the domain and
page - both from direct crawling
and various APIs.
We turned this into a set of pairs
of URLs {A,B} with their
associated keyword, metrics,
and their rank ordering.
137. The specifics of DeepRank
We have so far trained on just 10
metrics for a relatively small
sample (hundreds) of keywords.
Our current version is only a few
layers deep with only 10 hidden
dimensions.
The current training samples 30
pairs at a time and trains against
them for 500 epochs.
Train the
model
Gather and
process
training data
138. The specifics of DeepRank
The next task is to get way more
metrics for thousands of
keywords.
This will enable us to train a
much deeper model for much
longer without overfitting.
We also have some more
hyperparameter tuning to do,
Model
Train the
model
Gather and
process
training data
139. To run the model, we input a
pair of pages with their
associated metrics.
New
input
141. We get back a probability of
page A outranking page B.
Model
Probability-
weighted
predictions
New
input
142.
143.
144. The goal is a winning combination
of human and machine
Human + computer beats computer (for now)
145. Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. There are things computers remain bad at, and rankings will become more
opaque even to Google engineers
4. We remain relevant by:
a. Using methodologies and checklists to capture human capabilities and
avoid our biases
b. Becoming great consultants and change agents
c. Debugging the heck out of everything
d. Avoiding being misled by experts or Google
e. Testing!
5. Human + robot is the only thing that has a chance of beating the robots