2. Topics for the Masterclass
• Correlation analysis of search results
• Changes from Google Instant
• Information architecture & navigation structure
• Overcoming Twitter’s cannibalization of the link graph
• Making Analytics Actionable
• New Research: Topic Modeling in the Search Results
4. Correlation ≠ Causation
The more I wear suits, the more I speak on panels.
Therefore: wearing suits causes me to speak on panels.
5. Understanding Correlation Significance
No Correlation
Exact Match
Domain
Perfect
Correlation
Most of our data for search rankings falls in this region
(which we’d expect given algorithms w/ 200+ ranking factors)
7. Methodology
• 11,351 SERPs via Google AdWords Suggest
• 1st Page Only (usually ~10 results per page)
• Correlations are w/ Higher Position on Page 1
• Controlled for SERPs Where All (or None) of the
Results Matched the Metric
8. Methodology
Looking for elements
that higher ranking
pages have that
lower ones do not
NOT looking at raw
counts of how many
pages featured a
given element
9. Contains All
Query Terms in
Domain Name
Exact Match
Hyphenated
Domain
Exact Match
Domain
Highest Stderr = 0.0241804
10. Our Interpretation
• Exact match domains remain powerful in both
engines (anchor text could be a factor, too)
• Hyphenated versions are less powerful, though
more frequent in Bing (G: 271 vs. B: 890)
• Just having keywords in the domain name has
substantial positive correlation
11. Highest Stderr = 0.00350211
KWs in Body
KWs in Alt Attribute
KWs in H1 Tag
KWs in URL
KWs in Title
12. Our Interpretation
• The Alt attribute of images is interesting
• Putting KWs in URLs is likely a best practice
• Everyone optimizes titles (G: 11,115 vs. B: 11,143).
Differentiating here is hard.
• (Simplistic) on-page optimization isn’t a huge factor
14. Our Interpretation
• More reasons to believe Google when they say
.gov, .info and .edu are not special cased
• The .org TLD extension is surprising – do they earn
more links? Less spam? More non-commercial?
• Don’t forget about branding/user behavior - .com is
still probably a very good thing (at least own it)
15. Highest Stderr = 0.0033353
Content Length
(tokes in body)
URL Length
(chars.)
Domain Name
Length (chars.)
16. Our Interpretation
• Shorter URLs are likely a good best practice
(especially on Bing)
• Long domains may not be ideal, but aren’t awful
• Raw content length seems marginal in correlation
18. Highest Stderr = 0.00335677
# of Linking Root
Domains to URL
# of Links to URL
19. Our Interpretation
• Links are likely still a major part of the algorithms
• Bing may be slightly more naïve in their usage of
link data than Google, but better than before
• Diversity of link sources remains more important
than raw link quantity
20. Highest Stderr = 0.00415058
# of Links w/
exact match
anchor text
# of linking root
domains w/
exact match
anchor text
21. Our Interpretation
• Many anchor text links from the same domain likely
don’t add much value
• Anchor text links from diverse domains, however,
appears highly correlated
• Bing and Google are relatively similar in evaluating
these metrics
23. Our Interpretation
• PageRank (and similar algorithms) are not particularly
representative of rankings (but are somewhat correlated)
• Linking domains are likely a better metric than raw links
• Page Authority is reasonably good, but has a way to go
25. Our Interpretation
• No single domain valuation metric is especially well
correlated with rankings
• Rankings of individual pages may be more
disparate we typical think re: “domain authority”
• Overall, we’re still very naïve when it comes to
understanding how links influence search rankings
28. Methodology: Keyword Referral Search Data
• Look at keyword sending traffic via analytics
• Distribute into groups by word-length
• Analyze shifts in demand by keywords that
brought visits to the site
• Compare from period prior to Google Instant
and directly after
30. Via Distilled Consulting (UK)
11 Sites, Various Sizes (3.5K – 75K weekly visits), 75K+ Keywords
http://www.distilled.co.uk/blog/seo/impact-of-google-instant/
31. Via Conductor
Multiple sites, 880K visits, 10Ks of keywords
http://blog.conductor.com/2010/09/what%E2%80%99s-been-the-impact-of-google-instant-on-searcher-behavior-so-far-not-much/
32. Interesting Takeaways
• Google Instant seems not to have shifted keyword
demand by much (if at all)
• Google “suggest” has been out for a long time
already; users are likely accustomed to this feature
• The “long tail” may get longer/shorter over time, but
Instant seems less responsible than other factors
108. Turn Tweets from industry leaders into
content, too (this entices them to share)
http://www.seomoz.org/blog/bing-vs-google-prominence-of-ranking-elements
116. Takeaways
• # of Followers DEFINITELY isn’t everything
• Tweeting heavily may not necessarily hurt the attention
people pay to your tweets
• Klout score probably isn’t great for predicting the reach of
an individual’s tweets/messages
• TwitterGrader Rank may be useful for predicting the
traffic you’d get from a tweeter
• Avg. CTR across 254 shared links = 1.17% (don’t feel
bad if only 1/100 followers clicks a link)
120. Action: Measure against search engine market shares &
volume to determine whether you’re making positive strides
121. # Pages Getting Search Referrals Over Time
Measure this number on a
weekly/monthly basis
122. Action: Discover if indexation is an issue worth effort
This number sucks. Learn more about why at:
www.seomoz.org/blog/indexation-for-seo-real-numbers-in-5-easy-steps
123. # of Keywords Sending Traffic from a
Search Engine over Time
124. Action: Determine if content additions are accretive
and what drives growth/shrinkage in search traffic
Did rankings fall? Or is
demand down?
127. Action: Analyze top traffic drivers from a value perspective, check
rankings for potential easy wins & get answers if traffic dips
“SEO Tools” is a big win and
we could rank higher
129. Action: Determine value of reaching new visitors vs. converting branded
users (focus efforts on the more valuable one)
This metric speaks to business strategy about converting existing fans
vs. reaching new customer segments
138. Action: Compare to ROI metrics; if they correlate, improve on
keywords/landing pages with low time on site
Average “upgrade to PRO” visitor spent a
whopping 44 minutes on SEOmoz!
140. Action: Depending on your metrics, a “sweet spot” of pages browsed
often dictates a conversion event – optimize towards it
Average “upgrade to PRO” visitor visits 12X the
pages of an average visitor
144. Action: Find patterns/sources that predict sharing activities (both
content and CTAs) and make them testable conversion events
GA allows you to set custom actions as “goals” then filter,
monitor and improve on these metrics
160. Methodology: LDA (Latent Dirichlet Allocation)
• Build an LDA model based on the English
language Wikipedia dataset (8mil+ pages)
• Generate scores for top 10 rankings across
several thousand search results
• Look at correlation of search rankings with
scores (in process)
161. Chance of word is because of a topic
=
(Number of times the document already uses that topic a lot)
X
(Number of times that word has been in that topic)
Simplified LDA Formula
162. Tool to Test it Out
http://www.seomoz.org/labs/lda
163. Tool to Test it Out
http://www.seomoz.org/labs/lda
164. Tool to Test it Out
We might need to
work the “relevance”
of our content
http://www.seomoz.org/labs/lda
165. Interesting Takeaways
• There may be more to “on-page” optimization then
just using target keywords in the right places / ways
• Search engines keep saying “make relevant content”
– perhaps we can get more scientific and precise about
what “relevant” means
• Our LDA topic modeling work is still in its infancy.
Expect more data, correlations, etc. in weeks to come.