SlideShare a Scribd company logo
1 of 33
Download to read offline
Some time ago, we fell asleep at the switch. Search engines are now “evaluating the merit” of
our content and are not entirely clear about the criteria that they are using.




                                                                                                1
This presentation is about Google’s latest updates, Panda and Penguin, and how they impact
the content that is retained by the search engines and presented in search results. We will
look at:

1. What has happened with search engine technology over the years and what it is today
2. Why we should care. How search engine technology impacts what we do. How what we
   do can impact the performance of search engines.
3. What we can do about it.




                                                                                              2
Search engines came first. They have been around for over 70 years, since the their early
days of “information retrieval” when text began to be electronically transformed in the late
40’s. However, information organization and retrieval goes back even further than that…




                                                                                               3
An argument could be made that “search engine” optimization came first with the
early great care was taken to present information in a “findable” fashion…e.g. great
care by a designated few to make information available in limited format to the
limited few who would consume and make available to the masses. People optimized
text for people.




                                                                                       4
Then came the beautiful places where the information was organized in a standardized way
so that people could find it. And helpful people to ask for help finding information if we got
lost.

Early search engines used traditional information retrieval concepts and structured content
repositories that were mediated by human generated metadata. Dialog & ProQuest where
SQL queries rules, thought-processing bipeds associated tags, categories and abstracts to the
content item. dB methods of linear query construction delivered most success.




                                                                                                 5
First web page can still be found here http://www.w3.org/History/19921103-
hypertext/hypertext/WWW/TheProject.html

Then came the World Wide Web, altruistically developed by Tim Berners Lee so that the
military, industrial and scientific complexes could communicate with each other, be on the
same page and save money in the long distance exchange of information.
This worked well until the medium was made available to the rest of us.

The result….




                                                                                             6
Then limitless growth, questionable quality and zero governance with no end in sight
• 1997: 15 million pages
• 2010: Google announces its 100 billion+ page index
• 2012: rumored 1 trillion URLs found




                                                                                       7
© Tefko Saracevic




          Source: Saracevic 1997, Information Today

          One thing that did not change was information retrieval (IR). Despite the technology
          advancements, the IR process remained the same.




                                                                                                 8
Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang lazyjeff@uw.edu and Shawn Walker stw3@uw.edu

1. Documents were selected from the index based on the presence of query terms in
   document text.
2. Documents containing more of the term(s) scored higher
3. Longer documents discounted
4. Rare terms weighted higher




                                                                                                 9
The environment, devices, participants and content has changed. What does that
mean for IR? Search Engines?




                                                                                 10
IR’s locked in legacies are centered on
• text deconstruction
• the capacity for sequential instructions to derive meaning,
• its reliance on systems that do not scale well and while incorporating human
   behavior, do not fully understand it

Search engines today believe that it is perfectly natural for them to abstract the
whole based on the nature of a small subset = “digital Maoism”




                                                                                     11
Using Google’s Latent Semantic Indexing, a machine-learning technique that manually
maps relationships, a search for ~vacation turns up results for: hotels, rentals, travel,
tourism, resorts…

Machines know only what they are trained to know. Rules are based on an analysis of
a subset and applied to the content corpus writ large. Machines have no sense of
accountability when things go bad.




                                                                                            12
Stanford research project that was once greeted as a savior due to the simplicity and seeming
incorruptability.
Both creators PHD students in data mining
Standard IR with introduction of 2 human elements
        1. Random Surfer model
                  •At any time t, surfer is on some page P
                  •At time t+1, the surfer follows an outlink from uniformly at random
                  •Ends up on some page Q (from page P)
                  •Process repeats indefinitely
        2. Link = vote

Unfortunately, flaws in this system were soon revealed:
1. Those who were able to build links dictated relevance for the rest
2. The cottage industry of SEO started building links for reasons other then endorsing the
   merits of site content




                                                                                                13
Google goes public around this time and the cash infusion enables expansion
Starts acquiring top computer scientists
Google purchases technology (Kaltix – personalized search, context sensitive search)

This is the first step away from the PageRank model, not entirely though as PageRank
is part of Google’s locked-in technology foundation.

And the response from us thought-processing bipeds?




                                                                                       14
We’re constructing worse queries but feel that we’re getting better results.
Which canary in what coal mine just died?




                                                                               15
Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009
Pew Internet Trust Study of Search engine behavior
http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx

In January 2002, 52% of all Americans used search engines. In February 2012 that figure grew to 73%
of all Americans. On any given day in early 2012, more than half of adults using the internet use a
search engine (59%). That is double the 30% of internet users who were using search engines on a
typical day in 2004. And people’s frequency of using search engines has jumped dramatically.

Moreover, users report generally good outcomes and relatively high confidence in the capabilities of
search engines:
91% of search engine users say they always or most of the time find the information they are seeking
when they use search engines
73% of search engine users say that most or all the information they find as they use search engines is
accurate and trustworthy
66% of search engine users say search engines are a fair and unbiased source of information
55% of search engine users say that, in their experience, the quality of search results is getting better
over time, while just 4% say it has gotten worse
52% of search engine users say search engine results have gotten more relevant and useful over time,
while just 7% report that results have gotten less relevant

And Google’s response…




                                                                                                            16
Location on the page = good quality content
       “The goal of many of our ranking changes is to help searchers find sites that
       provide a great user experience and fulfill their information needs. We also
       want the “good guys” making great sites for users, not just algorithms, to see
       their effort rewarded. To that end we’ve launched Panda changes that
       successfully returned higher-quality sites in search results. And earlier this
       year we launched a page layout algorithm that reduces rankings for sites that
       don’t make much content available “above the fold.”
       Matt Cutts http://googlewebmastercentral.blogspot.com/2012/04/another-
       step-to-reward-high-quality.html

UX run Amok: if not enough content appears above the fold, the page will be seen as
less relevant? How many are dictating this for the rest of us? Where did they get this
from?
        “As we’ve mentioned previously, we’ve heard complaints from users that if
        they click on a result and it’s difficult to find the actual content, they aren’t
        happy with the experience. Rather than scrolling down the page past a slew of
        ads, users want to see content right away. So sites that don’t have much
        content “above-the-fold” can be affected by this change. If you click on a
        website and the part of the website you see first either doesn’t have a lot of
        visible content above-the-fold or dedicates a large fraction of the site’s initial
        screen real estate to ads, that’s not a very good user experience. Such sites




                                                                                             17
may not rank as highly going forward.”
http://insidesearch.blogspot.com/2012/01/page-layout-algorithm-
improvement.html




                                                                  17
Panda 1.0: Google’s first salvo against “spam” (shallow, thin content sites) in the form of content duplication and low value
original content (i.e. “quick, give me 200 words on Brittany Spear’s vacation in the Maldives”) – biggest target was content
farms – Biggest Impact: keyword optimization and link building

Keyword optimization: Shift in focus from text on page to user experience makes optimizing for keywords counter
intuitive. Biggest impact: shift from developer/shady SEO influence to usability/user experience focus – average loss in
positioning (% of KWs falling out of top 10 search results) – 70 to 90% for sites like merchantcircle.com, find articles.com,
buzzle.com, mahalo.com and ezinearticles.com (SISTRIX)
Link building: PageRank does not scale well to a 1 trillion page Web. Google cannot calculate PR fast enough to rerank
sites. PR now devalued as strongest influence behind ranking. Biggest impact: link building for higher PR = “what’s the
point?”

Panda 2.0: Changed rolled out to all English language queries English speaking countries , UK, Australia, etc., and in
countries where English Language results are stipulated. Ranking incorporates searcher “blocking” data (from Google
Chrome feature).

Panda 2.1: Having unique content not enough – quality factors introduced (some below)
                   Trustworthiness: with my credit card information
                   Uniqueness: is this saying what I’ve found somewhere else
                   Origination: does the person writing the content have “street cred,” do I believe that this is an
authoritative resource on this topic
                   Display: does the site look professional, polished
                   Professional: is the content well constructed, well edited and without grammatical or spelling errors
Panda 2.2: Google going after site scrapers that repurpose content not their own or those who “outsource” content
development and maintenance
Panda 2.3: Bounce rate (whether the user engages with the page at all) – Click through - Conversion



                                                                                                                                18
And sort of blames SEO for it (not outright but in a passive/aggressive) kind of way

2007 Google Patent: Methods and Systems for Identifying Manipulated Articles (November
2007)
Manipulation:
• Keyword stuffing (article text or metadata)
• Unrelated links
• Unrelated redirects
• Auto-generated in-links
• Guestbook pages (blog post comments)
Followed up: Google Patent: Content Entity Management (May 2012)




                                                                                         19
February 2011: algorithm focused on content quality - originally thought to be aimed at content
farms
June 2011: update to identify scraped or duplicated content
October 2011: unannounced update to rectify site “unfairly impacted” by original updates
January 2012: sites with too much ad space above the fold are devalued

The slide lists approximately 10% of the changes that Google told us about and what they tell us
about likely represents .10% of the changes that they actually make. (source:
http://insidesearch.blogspot.com)

Re: freshness bug fix: “This change turns off a freshness algorithm component in certain cases
when it should be affecting the search results.”
Will serve up the newer document when choosing between two (from a given site)




                                                                                                   20
Where’s Heidi Klum when we need her. Google’s quality content bar is higher and more
subjective than Project Runway.

Google: Arbiter of Content & Relevance http://www.stonetemple.com/matt-cutts-and-eric-
talk-about-what-makes-a-quality-site/
“Those other sites are not bringing additional value. While they’re not duplicates they bring
nothing new to the table.”

Google’s advice to site owners:
“If it is already a crowded space with entrenched players, consider focusing on a niche area
initially, instead of going head to head with the existing leaders of the space.”




                                                                                                21
The Penguin update is a bit different because it is an aggressive move on Google’s part that
starts with an algorithmic review. If a threshold is crossed, a human review takes place and
most sites are then significantly demoted in rankings or removed all together.

• Overly repetitive anchor text (“manipulative, repetitive anchor text”)
• Blog comments filled with spam (reviews/comments that contain links to “spam”) –
  Google’s definition of spam similar to Supreme Court for
• Porn, no explanation of what this is. The search engine spiders just know it when they see
  it
• Obscene content
• Web “clusters” – multiple Web sites on the same host, from same domain owner, linking
  to article in artificial way




                                                                                               22
Targets “exact match” keyword-ed links or aggressive anchor text to google
        • sites penalized had “moneyed keywords” in 65% of their incoming links
        • Obviously aimed at the long standing practice of outsourcing link building to 3rd
           world countries and the weed-like growth of useless directories (i.e. link farms)
Too many links from “related sites
        • Same niche
        • Same domain host
        • Same domain owner
Standard SEO signals
        • Stuffed <title> and metaDescription
        • Hidden text
        • Unrelated links on and pointing to the page
        • Computer generated text (i.e. dynamically rendered product pages)




                                                                                               23
24
The search engines think that we’re superfluous because we don’t “get search” That’s what
I’m here to end. I want you to “get search.” We are information professionals, not mice!
We’re going to use every neuron, synapsis and gray cell to fight back.

We will shift from trying to optimize search engine behavior to optimizing what the search
engines consume, move from search engine optimization to information optimization
• We will Focus
• We will be Collaborative
• We will get Connected
• We will stay Current

Because we are user experience professionals, not Matt Cutts, Sergey Brin or Larry Page.




                                                                                             25
26
Tools:
Core Metadata: 20-30 terms that represent intersection between client objectives and how
their customers search for the product/service
Content analytics: top pages, bounce rate, visitor flow
Content audit: keep/kill/revise based on thorough review using manual audit or tools
available through resources those from @content_insight




                                                                                           27
Stronger G+ profile = more organic search traffic
http://www.portent.com/blog/seo/google-plus-will-build-your-search-traffic.htm




                                                                                 28
If it barks, sings, dances, plays, changes whatever, annotate with something the
search engine can crawl, deconstruct, associate with surrogate and store in the index

• Relational content model: Next Steps as well as More Information using: guided
  tours, Best Bets, produced view, etc
• Best Bets: editorially assigned result that may not be chosen by the search engine
• Guided Tours: built on analysis of other user pathways and knowledge of corpus
  Produced Views: page of assembled content items focused on a single subject
• Task List Drop Downs: “I Want To…” links to pages of assembled content focused
  on single common task




                                                                                        29
30
This is a team effort.




                         31
It is not too soon to get started.




                                     32

More Related Content

What's hot

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Semantic Search Engine using Ontologies
Semantic Search Engine using OntologiesSemantic Search Engine using Ontologies
Semantic Search Engine using OntologiesIJRES Journal
 
Paul swain info officer for web
Paul swain info officer for webPaul swain info officer for web
Paul swain info officer for webPaul Swain
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesDavid Graus
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internetbsimoneaux
 
Googling and Beyond: Search the Web Effectively
Googling and Beyond: Search the Web EffectivelyGoogling and Beyond: Search the Web Effectively
Googling and Beyond: Search the Web EffectivelyNaomi Mellendorf
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyondBill Slawski
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
 
Biz 240 and the library1
Biz 240 and the library1Biz 240 and the library1
Biz 240 and the library1Traciwm
 
Sweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalSweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalMarianne Sweeny
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Shortslknight
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at YahooPeter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
BIZ 2401 and the Library
BIZ 2401 and the LibraryBIZ 2401 and the Library
BIZ 2401 and the LibraryTraciwm
 
Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullSonya Sigler
 

What's hot (19)

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Semantic Search Engine using Ontologies
Semantic Search Engine using OntologiesSemantic Search Engine using Ontologies
Semantic Search Engine using Ontologies
 
Paul swain info officer for web
Paul swain info officer for webPaul swain info officer for web
Paul swain info officer for web
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internet
 
Googling and Beyond: Search the Web Effectively
Googling and Beyond: Search the Web EffectivelyGoogling and Beyond: Search the Web Effectively
Googling and Beyond: Search the Web Effectively
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
Evaluating the Web
Evaluating the WebEvaluating the Web
Evaluating the Web
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas
 
Se
SeSe
Se
 
Biz 240 and the library1
Biz 240 and the library1Biz 240 and the library1
Biz 240 and the library1
 
Sweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalSweeny Seo30 Web20 Final
Sweeny Seo30 Web20 Final
 
Neigh october2012
Neigh october2012Neigh october2012
Neigh october2012
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
BIZ 2401 and the Library
BIZ 2401 and the LibraryBIZ 2401 and the Library
BIZ 2401 and the Library
 
Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 full
 

Viewers also liked

Sweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesSweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesMarianne Sweeny
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015Marianne Sweeny
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Defining the Search Experience
Defining the Search ExperienceDefining the Search Experience
Defining the Search ExperienceMarianne Sweeny
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search ExperienceMarianne Sweeny
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015Marianne Sweeny
 

Viewers also liked (6)

Sweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesSweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notes
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Defining the Search Experience
Defining the Search ExperienceDefining the Search Experience
Defining the Search Experience
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
 

Similar to Google's Panda and Penguin Updates Impact Content Optimization

Smashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEOSmashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEOBrightEdge
 
Configuring share point 2010 just do it
Configuring share point 2010   just do itConfiguring share point 2010   just do it
Configuring share point 2010 just do itMarianne Sweeny
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia Ferrer Mur
 
Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs inventionjournals
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused webcsandit
 
Challenges and emerging practices for knowledge organization in the electron...
Challenges and emerging practices for knowledge  organization in the electron...Challenges and emerging practices for knowledge  organization in the electron...
Challenges and emerging practices for knowledge organization in the electron...Anil Mishra
 
Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)DirectionFirst
 
From semantic platforms to semantic apps
From semantic platforms to semantic appsFrom semantic platforms to semantic apps
From semantic platforms to semantic appsscroisier
 
Structure Matters - Information Architecture for SEO and UX
Structure Matters - Information Architecture for SEO and UXStructure Matters - Information Architecture for SEO and UX
Structure Matters - Information Architecture for SEO and UXAscedia
 
Structure Matters - Information Architecture for UX & Conversions
Structure Matters - Information Architecture for UX & ConversionsStructure Matters - Information Architecture for UX & Conversions
Structure Matters - Information Architecture for UX & ConversionsJackie Burhans
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 
How to be a Killer Social Media Advocate and Sell it to Your Boss
How to be a Killer Social Media Advocate and Sell it to Your BossHow to be a Killer Social Media Advocate and Sell it to Your Boss
How to be a Killer Social Media Advocate and Sell it to Your BossRed Shoes PR
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...IJASCSE
 

Similar to Google's Panda and Penguin Updates Impact Content Optimization (20)

Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
Smashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEOSmashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEO
 
Configuring share point 2010 just do it
Configuring share point 2010   just do itConfiguring share point 2010   just do it
Configuring share point 2010 just do it
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
 
Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs
 
About search engines
About search enginesAbout search engines
About search engines
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused web
 
Challenges and emerging practices for knowledge organization in the electron...
Challenges and emerging practices for knowledge  organization in the electron...Challenges and emerging practices for knowledge  organization in the electron...
Challenges and emerging practices for knowledge organization in the electron...
 
A42020106
A42020106A42020106
A42020106
 
Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)Semantic web 3.0 paper (2009)
Semantic web 3.0 paper (2009)
 
From semantic platforms to semantic apps
From semantic platforms to semantic appsFrom semantic platforms to semantic apps
From semantic platforms to semantic apps
 
Structure Matters - Information Architecture for SEO and UX
Structure Matters - Information Architecture for SEO and UXStructure Matters - Information Architecture for SEO and UX
Structure Matters - Information Architecture for SEO and UX
 
Structure Matters - Information Architecture for UX & Conversions
Structure Matters - Information Architecture for UX & ConversionsStructure Matters - Information Architecture for UX & Conversions
Structure Matters - Information Architecture for UX & Conversions
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
E3602042044
E3602042044E3602042044
E3602042044
 
How to be a Killer Social Media Advocate and Sell it to Your Boss
How to be a Killer Social Media Advocate and Sell it to Your BossHow to be a Killer Social Media Advocate and Sell it to Your Boss
How to be a Killer Social Media Advocate and Sell it to Your Boss
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
 

More from Marianne Sweeny

Connection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingConnection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingMarianne Sweeny
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalMarianne Sweeny
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Marianne Sweeny
 
Uw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchUw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchMarianne Sweeny
 
Sweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionSweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionMarianne Sweeny
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices FinalMarianne Sweeny
 
Univ Washington Social Media Marketing
Univ Washington Social Media MarketingUniv Washington Social Media Marketing
Univ Washington Social Media MarketingMarianne Sweeny
 
Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Marianne Sweeny
 
SEO and IA: The Beginning of a Beautiful Friendship
SEO and IA: The Beginning of a Beautiful FriendshipSEO and IA: The Beginning of a Beautiful Friendship
SEO and IA: The Beginning of a Beautiful FriendshipMarianne Sweeny
 

More from Marianne Sweeny (10)

Connection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingConnection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital Marketing
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-final
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1
 
Uw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchUw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not Search
 
Sweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionSweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 Finalversion
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
Univ Washington Social Media Marketing
Univ Washington Social Media MarketingUniv Washington Social Media Marketing
Univ Washington Social Media Marketing
 
Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8
 
SEO and IA: The Beginning of a Beautiful Friendship
SEO and IA: The Beginning of a Beautiful FriendshipSEO and IA: The Beginning of a Beautiful Friendship
SEO and IA: The Beginning of a Beautiful Friendship
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Google's Panda and Penguin Updates Impact Content Optimization

  • 1. Some time ago, we fell asleep at the switch. Search engines are now “evaluating the merit” of our content and are not entirely clear about the criteria that they are using. 1
  • 2. This presentation is about Google’s latest updates, Panda and Penguin, and how they impact the content that is retained by the search engines and presented in search results. We will look at: 1. What has happened with search engine technology over the years and what it is today 2. Why we should care. How search engine technology impacts what we do. How what we do can impact the performance of search engines. 3. What we can do about it. 2
  • 3. Search engines came first. They have been around for over 70 years, since the their early days of “information retrieval” when text began to be electronically transformed in the late 40’s. However, information organization and retrieval goes back even further than that… 3
  • 4. An argument could be made that “search engine” optimization came first with the early great care was taken to present information in a “findable” fashion…e.g. great care by a designated few to make information available in limited format to the limited few who would consume and make available to the masses. People optimized text for people. 4
  • 5. Then came the beautiful places where the information was organized in a standardized way so that people could find it. And helpful people to ask for help finding information if we got lost. Early search engines used traditional information retrieval concepts and structured content repositories that were mediated by human generated metadata. Dialog & ProQuest where SQL queries rules, thought-processing bipeds associated tags, categories and abstracts to the content item. dB methods of linear query construction delivered most success. 5
  • 6. First web page can still be found here http://www.w3.org/History/19921103- hypertext/hypertext/WWW/TheProject.html Then came the World Wide Web, altruistically developed by Tim Berners Lee so that the military, industrial and scientific complexes could communicate with each other, be on the same page and save money in the long distance exchange of information. This worked well until the medium was made available to the rest of us. The result…. 6
  • 7. Then limitless growth, questionable quality and zero governance with no end in sight • 1997: 15 million pages • 2010: Google announces its 100 billion+ page index • 2012: rumored 1 trillion URLs found 7
  • 8. © Tefko Saracevic Source: Saracevic 1997, Information Today One thing that did not change was information retrieval (IR). Despite the technology advancements, the IR process remained the same. 8
  • 9. Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang lazyjeff@uw.edu and Shawn Walker stw3@uw.edu 1. Documents were selected from the index based on the presence of query terms in document text. 2. Documents containing more of the term(s) scored higher 3. Longer documents discounted 4. Rare terms weighted higher 9
  • 10. The environment, devices, participants and content has changed. What does that mean for IR? Search Engines? 10
  • 11. IR’s locked in legacies are centered on • text deconstruction • the capacity for sequential instructions to derive meaning, • its reliance on systems that do not scale well and while incorporating human behavior, do not fully understand it Search engines today believe that it is perfectly natural for them to abstract the whole based on the nature of a small subset = “digital Maoism” 11
  • 12. Using Google’s Latent Semantic Indexing, a machine-learning technique that manually maps relationships, a search for ~vacation turns up results for: hotels, rentals, travel, tourism, resorts… Machines know only what they are trained to know. Rules are based on an analysis of a subset and applied to the content corpus writ large. Machines have no sense of accountability when things go bad. 12
  • 13. Stanford research project that was once greeted as a savior due to the simplicity and seeming incorruptability. Both creators PHD students in data mining Standard IR with introduction of 2 human elements 1. Random Surfer model •At any time t, surfer is on some page P •At time t+1, the surfer follows an outlink from uniformly at random •Ends up on some page Q (from page P) •Process repeats indefinitely 2. Link = vote Unfortunately, flaws in this system were soon revealed: 1. Those who were able to build links dictated relevance for the rest 2. The cottage industry of SEO started building links for reasons other then endorsing the merits of site content 13
  • 14. Google goes public around this time and the cash infusion enables expansion Starts acquiring top computer scientists Google purchases technology (Kaltix – personalized search, context sensitive search) This is the first step away from the PageRank model, not entirely though as PageRank is part of Google’s locked-in technology foundation. And the response from us thought-processing bipeds? 14
  • 15. We’re constructing worse queries but feel that we’re getting better results. Which canary in what coal mine just died? 15
  • 16. Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 Pew Internet Trust Study of Search engine behavior http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx In January 2002, 52% of all Americans used search engines. In February 2012 that figure grew to 73% of all Americans. On any given day in early 2012, more than half of adults using the internet use a search engine (59%). That is double the 30% of internet users who were using search engines on a typical day in 2004. And people’s frequency of using search engines has jumped dramatically. Moreover, users report generally good outcomes and relatively high confidence in the capabilities of search engines: 91% of search engine users say they always or most of the time find the information they are seeking when they use search engines 73% of search engine users say that most or all the information they find as they use search engines is accurate and trustworthy 66% of search engine users say search engines are a fair and unbiased source of information 55% of search engine users say that, in their experience, the quality of search results is getting better over time, while just 4% say it has gotten worse 52% of search engine users say search engine results have gotten more relevant and useful over time, while just 7% report that results have gotten less relevant And Google’s response… 16
  • 17. Location on the page = good quality content “The goal of many of our ranking changes is to help searchers find sites that provide a great user experience and fulfill their information needs. We also want the “good guys” making great sites for users, not just algorithms, to see their effort rewarded. To that end we’ve launched Panda changes that successfully returned higher-quality sites in search results. And earlier this year we launched a page layout algorithm that reduces rankings for sites that don’t make much content available “above the fold.” Matt Cutts http://googlewebmastercentral.blogspot.com/2012/04/another- step-to-reward-high-quality.html UX run Amok: if not enough content appears above the fold, the page will be seen as less relevant? How many are dictating this for the rest of us? Where did they get this from? “As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites 17
  • 18. may not rank as highly going forward.” http://insidesearch.blogspot.com/2012/01/page-layout-algorithm- improvement.html 17
  • 19. Panda 1.0: Google’s first salvo against “spam” (shallow, thin content sites) in the form of content duplication and low value original content (i.e. “quick, give me 200 words on Brittany Spear’s vacation in the Maldives”) – biggest target was content farms – Biggest Impact: keyword optimization and link building Keyword optimization: Shift in focus from text on page to user experience makes optimizing for keywords counter intuitive. Biggest impact: shift from developer/shady SEO influence to usability/user experience focus – average loss in positioning (% of KWs falling out of top 10 search results) – 70 to 90% for sites like merchantcircle.com, find articles.com, buzzle.com, mahalo.com and ezinearticles.com (SISTRIX) Link building: PageRank does not scale well to a 1 trillion page Web. Google cannot calculate PR fast enough to rerank sites. PR now devalued as strongest influence behind ranking. Biggest impact: link building for higher PR = “what’s the point?” Panda 2.0: Changed rolled out to all English language queries English speaking countries , UK, Australia, etc., and in countries where English Language results are stipulated. Ranking incorporates searcher “blocking” data (from Google Chrome feature). Panda 2.1: Having unique content not enough – quality factors introduced (some below) Trustworthiness: with my credit card information Uniqueness: is this saying what I’ve found somewhere else Origination: does the person writing the content have “street cred,” do I believe that this is an authoritative resource on this topic Display: does the site look professional, polished Professional: is the content well constructed, well edited and without grammatical or spelling errors Panda 2.2: Google going after site scrapers that repurpose content not their own or those who “outsource” content development and maintenance Panda 2.3: Bounce rate (whether the user engages with the page at all) – Click through - Conversion 18
  • 20. And sort of blames SEO for it (not outright but in a passive/aggressive) kind of way 2007 Google Patent: Methods and Systems for Identifying Manipulated Articles (November 2007) Manipulation: • Keyword stuffing (article text or metadata) • Unrelated links • Unrelated redirects • Auto-generated in-links • Guestbook pages (blog post comments) Followed up: Google Patent: Content Entity Management (May 2012) 19
  • 21. February 2011: algorithm focused on content quality - originally thought to be aimed at content farms June 2011: update to identify scraped or duplicated content October 2011: unannounced update to rectify site “unfairly impacted” by original updates January 2012: sites with too much ad space above the fold are devalued The slide lists approximately 10% of the changes that Google told us about and what they tell us about likely represents .10% of the changes that they actually make. (source: http://insidesearch.blogspot.com) Re: freshness bug fix: “This change turns off a freshness algorithm component in certain cases when it should be affecting the search results.” Will serve up the newer document when choosing between two (from a given site) 20
  • 22. Where’s Heidi Klum when we need her. Google’s quality content bar is higher and more subjective than Project Runway. Google: Arbiter of Content & Relevance http://www.stonetemple.com/matt-cutts-and-eric- talk-about-what-makes-a-quality-site/ “Those other sites are not bringing additional value. While they’re not duplicates they bring nothing new to the table.” Google’s advice to site owners: “If it is already a crowded space with entrenched players, consider focusing on a niche area initially, instead of going head to head with the existing leaders of the space.” 21
  • 23. The Penguin update is a bit different because it is an aggressive move on Google’s part that starts with an algorithmic review. If a threshold is crossed, a human review takes place and most sites are then significantly demoted in rankings or removed all together. • Overly repetitive anchor text (“manipulative, repetitive anchor text”) • Blog comments filled with spam (reviews/comments that contain links to “spam”) – Google’s definition of spam similar to Supreme Court for • Porn, no explanation of what this is. The search engine spiders just know it when they see it • Obscene content • Web “clusters” – multiple Web sites on the same host, from same domain owner, linking to article in artificial way 22
  • 24. Targets “exact match” keyword-ed links or aggressive anchor text to google • sites penalized had “moneyed keywords” in 65% of their incoming links • Obviously aimed at the long standing practice of outsourcing link building to 3rd world countries and the weed-like growth of useless directories (i.e. link farms) Too many links from “related sites • Same niche • Same domain host • Same domain owner Standard SEO signals • Stuffed <title> and metaDescription • Hidden text • Unrelated links on and pointing to the page • Computer generated text (i.e. dynamically rendered product pages) 23
  • 25. 24
  • 26. The search engines think that we’re superfluous because we don’t “get search” That’s what I’m here to end. I want you to “get search.” We are information professionals, not mice! We’re going to use every neuron, synapsis and gray cell to fight back. We will shift from trying to optimize search engine behavior to optimizing what the search engines consume, move from search engine optimization to information optimization • We will Focus • We will be Collaborative • We will get Connected • We will stay Current Because we are user experience professionals, not Matt Cutts, Sergey Brin or Larry Page. 25
  • 27. 26
  • 28. Tools: Core Metadata: 20-30 terms that represent intersection between client objectives and how their customers search for the product/service Content analytics: top pages, bounce rate, visitor flow Content audit: keep/kill/revise based on thorough review using manual audit or tools available through resources those from @content_insight 27
  • 29. Stronger G+ profile = more organic search traffic http://www.portent.com/blog/seo/google-plus-will-build-your-search-traffic.htm 28
  • 30. If it barks, sings, dances, plays, changes whatever, annotate with something the search engine can crawl, deconstruct, associate with surrogate and store in the index • Relational content model: Next Steps as well as More Information using: guided tours, Best Bets, produced view, etc • Best Bets: editorially assigned result that may not be chosen by the search engine • Guided Tours: built on analysis of other user pathways and knowledge of corpus Produced Views: page of assembled content items focused on a single subject • Task List Drop Downs: “I Want To…” links to pages of assembled content focused on single common task 29
  • 31. 30
  • 32. This is a team effort. 31
  • 33. It is not too soon to get started. 32