Avoid duplicate content and don’t leave money on the table with unoptimized groups of pages linked by canonical declarations! Particularly in e-commerce, you can increase Google’s confidence by making sure your groups of product URLs are perfectly canonicalized and clear to search engines.
13. Ø Duplicate content
consolidation can be
executed relatively quickly,
as it requires a small set of
technical changes
Ø You will likely see improved
rankings within weeks after
the corrections are in place
Ø New changes and
improvements to your site
are picked up faster by
Google
14. Ø Natzir found the
total traffic to
pages ranking
for the same
keyword was
less than when
consolidated
with redirects
Ø Same idea but
from a
keywords’
perspective
https://www.youtube.com/watch?v=zI_jkhSyAew
16. Ø Finding repeatable success
Ø Searching for a machine
learning model to connect
new visits to technical SEO
changes
Ø We focused on the impact
of links, indexing, and
canonical clustering
17.
18. Our best predictive model
achieved 85% test accuracy
Ø Canonicalization drives
repeatable success
Ø The size of the canonical
cluster turned out to be a
strong predictor
19. One oversimplified way to
think about a machine
learning model is to
picture a linear regression
function in Excel/Sheets.
We predicted new users
(Y) within canonicalized
clusters dependent on the
size of the clusters (X).
Machine Learning 101
https://bit.ly/3lGyeqA
22. Their optimal canonical setup is the
inverse.
Most clusters should canonicalize to one
product “leader”
23. For some products, people
specific the color they want
directly in Google. But, for other
products, they don’t.
They decide the color they want
after seeing the options
available in the site.
25. Technical Plan
Ø Build clusters using OnCrawl
Ø Get search demand using SEMrush
Ø Canonicalization algorithm
Ø Experiment on CDN using RankSense
Ø Automate everything using Cloud Functions and
Pub/sub queues
28. Pub/Sub is an asynchronous
messaging service that
decouples services that
produce events from services
that process events.
It allows us to connect
OnCrawl, SEMrush, and
RankSense asynchronously to
complete a custom workflow.
29.
30. Cloud Scheduler acts as a
single pane of glass, allowing us
to manage all our automation
tasks from one place.
It allows us to trigger our custom
workflow on recurring times as
search demand changes with
seasons.
41. Ø Cloud Scheduler triggers
OnCrawl Cloud Function
which uploads each craw
export to Cloud Storage
Ø Cloud Storage update
triggers SEMrush Cloud
Function which then exports
search demand data to
Cloud Storage
44. Ø We are going to perform an
intermediate step and force
all product groups to
canonicalize to the “leader”
URL in the group.
Ø The “leader” could be the
URL with most search
traffic, more
internal/external links or
most frequently crawled
45.
46. We end up with one cluster that
we need to update, which
means that David Yurman is
leaving a lot of money on the
table with their current setup
that relies on self-referential
canonicals.
49. We are going to use the
RankSense API to publish our
new canonical clusters as
experiments in the Cloudflare
CDN
https://bit.ly/3jWm4JP
50. Ø We automatically populate
a Google Sheet with the
changes
Ø We submit the Sheet to
RankSense’s
PRODUCTION environment
51.
52. Resources to Learn More
Ø Python code covered in this presentation
https://github.com/ranksense/weloveseo
Ø Advanced Duplicate Content Consolidation with
Python
https://www.searchenginejournal.com/advanced-
duplicate-content-consolidation-python/314471/
Ø Cloud Functions https://cloud.google.com/functions
Ø Google PubSub https://cloud.google.com/pubsub
Ø Introduction to Python for SEO Pros
https://www.searchenginejournal.com/introduction-
to-python-seo-spreadsheets/342779/