Last updated 5/15/17. How do you use unique data to create SEO-driven landing pages? Programmatic SEO (large websites, 25k+ URLs) presentation for 500 Startups Distro Dojo Toronto by Mushi Labs.
Topics covered include:
* What is programmatic SEO?
* Elements of programmatic SEO
* SEO Research & discovery
* Commonly overlooked technical mistakes
* Does Google actually like your content?
* How to fix your SEO content woes
* Recommended SEO Tools
2. Hello!
I’m Bernard Huang
SEO specialist and partner at Mushi Labs.
Growth advisor in residence at 500 Startups.
You can find me at:
bernard@mushilabs.com
linkedin.com/in/bernardjhuang
4. Overview
● What is programmatic SEO?
● Elements of programmatic SEO
● Research & discovery
● Commonly overlooked technical
mistakes
● Does Google actually like your
content?
● How to fix your SEO content woes
6. Programmatic SEO
Creating SEO landing pages that
target search queries using metadata
on a large scale.
Websites that have successful
programmatic SEO:
● Yelp
● TripAdvisor
● Zillow
9. Elements of Programmatic SEO
Research &
discovery
● How’re people
searching for things
related to your
business?
● What types of pages
does Google think are
relevant for search
queries?
● How many searches
are happening per
month?
Technical
● Can Google crawl all
the pages on your
website?
● Can Google actually
view the content on
each of the pages?
● How do we get Google
to constantly crawl
the pages we want?
Content
● How trustworthy
does Google think our
website is?
● Does Google think our
content is relevant for
the search queries
we’re targeting?
11. How’re people searching for keywords that
will be relevant to your business?
● quantify searcher intent
● competitive analysis
● identify modifiers
Research & discovery
12. R&D - quantify searcher intent
SEO is about figuring out how to capture
the most relevant search traffic for your
business.
Search intent varies based on keywords:
● [housekeeper] - informational
● [house cleaning ny] - location
● [need house cleaning services near me] -
transactional + location
13. R&D - quantify searcher intent
Start by identifying the head keywords you
want to rank:
● [house cleaning]
● [home cleaning]
● [maid services]
● [kitchen cleaners]
14. R&D - competitive analysis
Learn what keywords your competitors are
ranking for using competitive analysis
tools:
● SEMrush
● SimilarWeb
● SpyFu
20. R&D - identify modifiers
Bucket keyword modifiers by categories
and intent:
21. R&D - quantify search intent
Determine size of search opportunity by
looking at monthly search volume and
keyword intent.
22. R&D - quantify searcher intent
Now we know how people are searching
for things related to our business.
Looks like people searching for [home
cleaning] want to know rates and see
options for kitchens, bathrooms, and
garages.
24. Common technical mistakes
Large websites usually suffer from these
technical mistakes:
● internal linking
● URL params
● pagination
● crawl errors
● sitemaps
26. Coverage - not linking to every page
somewhere in your website.
Prioritization - not correctly linking
enough to your high SEO value pages.
Depth - the SEO pages you want to
rank in Google take too many clicks to
get to from the homepage.
Internal linking - problems
29. Internal linking - coverage
If Google can’t find a
way to get to pages on
your website, how will
Google know the page
exists?
When you don’t link to a
page on your website,
that’s like a vote of no
confidence for that page.
30. Internal linking - coverage [fix]
Make sure that every page on your
website is linked to within your website.
This provides search engine bots a crawl
path to discover all of the content on your
site.
● Common places to stuff links include:
footer, nav bar, user sitemap
31. Internal linking - prioritization
Internal links are valued differently.
For example,
● STRONG - homepage, above the fold link
● MEDIUM - every page, nav bar links
● LOW - some pages, side bar links
● TRASH - every page, footer links
32. The type and # of incoming links to each of
the webpages are internal votes.
Vote wisely:
Internal linking - prioritization
34. Internal linking - prioritization [fix]
Make sure SEO pages you want to rank
are linked to properly throughout your
website.
● Use a nav bar menu to target your
highest value SEO pages.
● Strengthen your internal linking with
the use of breadcrumbs schema.
35. Internal linking - depth
Link depth is the # of clicks that it takes to get
from your homepage to other pages.
36. “Keep important pages within
several clicks from the homepage.
https://webmasters.googleblog.com/2008/10/importance-of-link-architecture.html
37. Internal linking - depth
The deeper pages are in your website, the less
valuable they are to Google.
39. Amazon
Nav bar mega menus gives Google a link
depth 1 crawl path to many important pages.
40. Internal linking - depth [fix]
Make sure you can get to your highest
value SEO pages within 2 clicks and
almost all of your pages within 5 clicks
from the homepage.
● Create a user sitemap to decrease the
link depth of all your pages.
● Use a nav bar mega menu to target your
highest value SEO pages.
41. URL parameters
URL parameters are common ways to pass
data to a page through its URL.
For example,
https://website.com/page?data=value
https://website.com/page,data=value
42. URL parameters
Google treats every URL parameter as a
separate and unique URL.
Which means these are 3 different pages in
Google’s eyes:
● https://website.com/page
● https://website.com/page?data=value
● https://website.com/page?data=value2
43. Duplicate content - URL params can
accidentally create lots of duplicate pages.
Crawl inefficient - Googlebot resources
are spent crawling URL param pages which
may bottleneck crawling your other pages.
URL clutter - URLs may look messier which
can lower click-through-rates (CTR).
URL parameters - problems
45. Google tries the different URL param pages in
the search engine results page.
Most of the time, it’s duplicate content…
URL parameters - duplicate content
46. Google only crawls a certain % of pages on
your website every day:
Letting your crawl budget go to unexpected
URL param pages bottlenecks the crawling
of your other content!
URL parameters - crawl inefficient
47. Do you have URL
param pages
accidentally
being shown in
the SERPs?
Search Console:
Pages filter
containing “?”
URL parameters - crawl inefficient
48. Which URL looks most friendly for a Canon
Powershot SD400 Camera?
Amazon.com -
http://www.amazon.com/gp/product/B0007TJ5OG/102-8372974-406
4145?v=glance&n=502394&m=ATVPDKIKX0DER&n=3031001&s=ph
oto&v=glance
Canon.com -
http://consumer.usa.canon.com/ir/controller?act=ModelDetailAct&fcat
egoryid=145&modelid=11158
DPReview.com -
http://www.dpreview.com/reviews/canonsd400/
URL parameters - URL clutter
49. URL parameters - [fixes]
Depending on how your website uses URL
parameters, you could:
● <link rel canonical> URLs with params
back to non-params URLs
● disallow URL params in robots.txt
● configure URL params in Search
Console
● replace params pages with unique
non-params URLs
51. Pagination - problems
Crawl inefficient - Google will crawl all the
paginated pages if you let it, which can
bottleneck your crawl rate for other pages.
Duplicate content - if implemented
improperly, the content on your website
could look duplicate on paginated pages.
Thin content - paginated pages often don’t
have significant amounts of quality content.
52. Pagination - [fixes]
The preferred pagination implementation
referenced by Google:
<link rel="prev"
href="http://www.example.com/?page=1">
<link rel="next"
href="http://www.example.com/?page=2">
https://support.google.com/webmasters/answer/1663744?hl=en
53. Thumbtack
Do you even need pagination? Purposely no
pagination, just “top 10” lists.
https://www.thumbtack.com/ca/san-francisco/roofing/
54. Yelp
Pagination for user experience but doesn’t
allow Google to index paginated pages.
view-source:
https://www.yelp.com/search?find_desc=lunch&find_loc=San+Francisco,+CA&start=10
55. Sitemaps
Creating a valid sitemap and submitting
them to Google and Bing will help the
search engines better understand the site.
Bing relies more heavily than Google on
sitemaps to crawl websites.
56. Sitemaps - problem
Cleanliness - your sitemap should only
contain URLs of SEO value.
Search Console > Crawl > Sitemaps > Specific Sitemap
57. “Your Sitemaps need to be clean. We
have a 1% allowance for dirt in a
Sitemap. Examples of dirt are if we
click on a URL and we see a redirect,
a 404 or a 500 code. If we see more
than a 1% level of dirt, we begin
losing trust in the Sitemap
https://www.stonetemple.com/search-algorithms-and-bing-webma
ster-tools-with-duane-forrester/
58. Sitemaps - cleanliness
[fixes]
Make sure each of the URLs in the sitemap:
● are unique
● return status code 200
● aren’t <link rel=’canonical’> to a
different URL
● aren’t `no indexed` in robots.txt or on
the page
59. Crawl errors
Crawl errors occur when Googlebot crawls
a page on your website and receives an
error.
Google correlates the # crawl errors with
website quality.
60. Crawl errors - problems [fix]
Go to Google Search Console and resolve
the underlying cause of your crawl errors.
61. Other potential technical issues
site load speed - how
fast is your website
loading?
mobile friendliness -
Is your website
mobile friendly?
62. 5.
DOES GOOGLE LIKE YOUR
CONTENT?
Indexation %s, last cache date, crawl budget
63. Does Google like your content?
There are quite a few ways to figure out if
Google likes your content:
● Indexation %s
● Last cache date
● Crawl budget
64. Content - indexation %s
Just because Google crawls all the pages
on your website does not mean it will put
all those pages into the SERPs.
Indexation % is the ratio of pages Google
has chosen to index divided by the total #
of pages on your website.
65. Content - indexation %s
● Google “site:yourwebsite.com”
How does the # compare to how many
pages you have total?
66. Content - indexation %s
How does the #
compare to how
many pages you
have in total?
● Check Search Console > Google Index >
Index Status
68. Split up sitemaps into different buckets for
granular content analysis of your pages:
Content - indexation %s
69. Content - last cache date
Every time Google crawls your page, they
save a snapshot of what they find in their
database.
Last cache date is the last date which
Google came to a page and saved its
contents into their database.
70. Content - last cache date
You can find the last cache date in the
SERPs:
71. Content - last cache date
Which’ll bring up a page where you can see
the last date Google cached your page:
screenshot taken 6/10
72. Content - last cache date
For competitive queries, most search
results on the first page have been cached
within the last 14 days.
screenshots taken 6/10
73. Content - last cache date
When last cache dates are > 1 month it’s an
indication of poor content on the page.
Or if Google hasn’t even cached your page:
screenshots taken 6/10
74. Depending on a variety of factors, Google
will only crawl a certain % of pages on your
website every day:
Content - crawl budget
75. Crawl budget isn’t distributed evenly.
Internal linking and content quality play big
roles in deciding where the crawlers go:
Content - crawl budget
76. You can see if you have content problems by
looking at the following stats:
● avg. pages crawled per day / total pages
● submitted / indexed pages (sitemap)
● last cache dates on pages > 1 month old
Content - problems
77. 6.
HOW TO TACKLE YOUR
SEO CONTENT WOES
de-index / remove pages, enhance existing content, get
more user generated content
78. Content - [fixes]
The common fixes to content problems
include:
● de-index / remove pages based on
content criteria
● enhance existing content with more
unique data, editorial content, etc.
● get more user generated content
through onboarding or incentives
79. Google grades your website as a whole.
Which means poor content on some pages
actually drag down the quality of your
entire site.
De-indexing, removing, or consolidating
pages is a great way to recover crawl
budget and improve website quality.
Content - de-index / remove pages
80. Determine poor performing SEO pages by
looking at:
● last cache date
● organic traffic to page (Google Analytics)
Content - de-index / remove pages
81. Content - de-index / remove pages
Axe the pages that aren’t getting search
traffic:
de-index
● <meta name=”robots” content=”noindex”>
remove
● return status code 410
consolidate
● can categories / geos pages be grouped
into more content rich pages?
82. Content - enhance existing content
Find ways to enhance existing content that’ll
benefit the user based on what types of
content Google is already showing in SERPs:
● photos
● videos
● market reports
● more listings
● editorial content
86. Yelp
Having almost all the restaurants in Yelp’s
database has secured its Google rankings.
87. Content - user generated content
Find ways get more user generated content:
● extracting more information from
onboarding process
● incentivize users to create content
● displaying anonymized user content