SlideShare a Scribd company logo
1 of 17
Download to read offline
Information Retrieval

   James Melzer

   June 15, 2006




                        1
How Does Search Work?




                        2
The basics of search

• A search engine mediates between user’s query and metadata surrogates for
  documents


• Documents are reduced to metadata


• User’s need is translated into a query


• Query terms are used to find matching metadata terms


• Lots and lots of room for error...




                                                                              3
The search process

1. Crawl content for metadata


2. Index document terms into an inverted file;
   an inverted file is very fast to search


3. Search the index to identify the result set;
   search the index - not the documents


4. Rank the results for display;
   ranking is the hardest part




                                                  4
Search algorithm 1

Term-based Ranking (tf/idf)


• tf = term frequency
  documents that use the query terms most are presumed to be most relevant


• idf = inverse document frequency
  terms that are more rare are better indicators of relevance


• Assumptions
  1) relevance can be measured with document terms




                                                                             5
Search algorithm 2

PageRank (Google)


• Relevant set is still identified by term matching


• A revolution in ranking:
  based on linking between documents


• Assumptions:
  1) important sites link to other important sites
  2) if many people link to a site, it is important




                                                      6
Citation Analysis

• Authors carefully select articles to cite


• The more citations an article gets,
  the better it must be


• Citations by authors who have a lot of citations confers their power to those
  they cite


• Aggregate and leverage all these small individual decisions...




                                                                                  7
How Complex is
Google?
    Google has about
    36 ranking algorithms

    Examples:

    Citation Analysis

    Statistical Clustering

    Parsing Document Structure

    Parsing Data in the Document

    Microcontent Parsing

8
How to Make Search Better?




                             9
Evaluating Search

Recall


the percentage of all relevant documents retrieved


100% recall means every relevant document is retrieved


Precision


the percentage of documents retrieved that are relevant


100% precision means only relevant documents are retrieved



                                                             10
Thoughts & Reservations about Evaluating Search

• Precision and Recall are usually inversely proportional, so improving one often
  reduces the other.


• Given a corpus of content like the web (tens of billions of items)...
  Recall is unmeasurable, and thus essentially meaningless


• What is relevance?


• Measuring Precision depends on an agreed definition of relevance, which is
  tricky (human cataloging is only about 80% ‘accurate’ - relevance is very hard
  to quantify)
Zipf
Best Bets

• Manually selected results, tied to specific query terms or phrases


• User-driven phrases
  select the most-used phrases from search traffic;
  go for easy wins, because returns diminish sharply


• Business-driven phrases
  select phrases important to the business;
  such as product names or office locations;
  or politically sensitive phrases, so you can control the message people see




                                                                                12
Relevance Feedback

• The user provides direct or indirect feedback on the search results


• Click tracking


• “More like this” or “Find similar”


• Clustering




                                                                        13
Structured Search

• Designers use patterns in search behavior to guess user’s intent;
  this requires a substantial understanding of user behavior;
  it may require structured content (although, not necessarily)


Examples

• Zip Code -> Zip Code Lookup Tool

• Person’s name -> Directory Listing

• Product Name -> Shop or Support?

• Address -> Map this?

• Topic -> Introduction, Forms, Policies or Reports?


                                                                      14
Controlled Vocabularies

• Classification with a controlled vocabulary is the best way to ensure 100%
  Recall


• Lead-in synonyms
  enter “fridge”; get “refrigerator” instead;
  best if the collection is well-cataloged
  increases precision (e.g. in a library)


• Term-expansion synonyms;
  enter “refrigerator”; get “fridge” too;
  best if the collection is not well-cataloged
  increases recall at the cost of precision (e.g on eBay)


• Spell check on query phrases

                                                                              15
Why is search
important?

IF:
About half of all users prefer to
search first*


THEN:
What percentage of a content
site’s development effort should
be devoted to search?




* This statistic is highly context-dependent. People’s
behavior depends on the context of their actions.
The stat is from Jared Spool.

16
Questions?
James Melzer
Information Architect
SRA International
james_melzer@sra.com




                        17

More Related Content

Similar to Information Retrieval (for beginners)

Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for FindabilityFindwise
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherIan Lurie
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction documentrajatkr
 
Change Your Search to Find – SharePoint and Office 365 Webinar
Change Your Search to Find – SharePoint and Office 365 WebinarChange Your Search to Find – SharePoint and Office 365 Webinar
Change Your Search to Find – SharePoint and Office 365 WebinarConcept Searching, Inc
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Quality not quantity
Quality not quantityQuality not quantity
Quality not quantityvanesz
 
Essential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Searchandrew_paulsen
 
Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior PatternsRamzi Alqrainy
 
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...Findwise
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseRavi Mynampaty
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 

Similar to Information Retrieval (for beginners) (20)

Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for Findability
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
Change Your Search to Find – SharePoint and Office 365 Webinar
Change Your Search to Find – SharePoint and Office 365 WebinarChange Your Search to Find – SharePoint and Office 365 Webinar
Change Your Search to Find – SharePoint and Office 365 Webinar
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Search Analytics - Comperio
Search Analytics - ComperioSearch Analytics - Comperio
Search Analytics - Comperio
 
Quality not quantity
Quality not quantityQuality not quantity
Quality not quantity
 
Essential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Search
 
Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior Patterns
 
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
Developing a Search & Findability Practice for the Enterprise – Ravi Mynampat...
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the Enterprise
 
Needle in a Haystack_ACS
Needle in a Haystack_ACSNeedle in a Haystack_ACS
Needle in a Haystack_ACS
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Information Retrieval (for beginners)

  • 1. Information Retrieval James Melzer June 15, 2006 1
  • 3. The basics of search • A search engine mediates between user’s query and metadata surrogates for documents • Documents are reduced to metadata • User’s need is translated into a query • Query terms are used to find matching metadata terms • Lots and lots of room for error... 3
  • 4. The search process 1. Crawl content for metadata 2. Index document terms into an inverted file; an inverted file is very fast to search 3. Search the index to identify the result set; search the index - not the documents 4. Rank the results for display; ranking is the hardest part 4
  • 5. Search algorithm 1 Term-based Ranking (tf/idf) • tf = term frequency documents that use the query terms most are presumed to be most relevant • idf = inverse document frequency terms that are more rare are better indicators of relevance • Assumptions 1) relevance can be measured with document terms 5
  • 6. Search algorithm 2 PageRank (Google) • Relevant set is still identified by term matching • A revolution in ranking: based on linking between documents • Assumptions: 1) important sites link to other important sites 2) if many people link to a site, it is important 6
  • 7. Citation Analysis • Authors carefully select articles to cite • The more citations an article gets, the better it must be • Citations by authors who have a lot of citations confers their power to those they cite • Aggregate and leverage all these small individual decisions... 7
  • 8. How Complex is Google? Google has about 36 ranking algorithms Examples: Citation Analysis Statistical Clustering Parsing Document Structure Parsing Data in the Document Microcontent Parsing 8
  • 9. How to Make Search Better? 9
  • 10. Evaluating Search Recall the percentage of all relevant documents retrieved 100% recall means every relevant document is retrieved Precision the percentage of documents retrieved that are relevant 100% precision means only relevant documents are retrieved 10
  • 11. Thoughts & Reservations about Evaluating Search • Precision and Recall are usually inversely proportional, so improving one often reduces the other. • Given a corpus of content like the web (tens of billions of items)... Recall is unmeasurable, and thus essentially meaningless • What is relevance? • Measuring Precision depends on an agreed definition of relevance, which is tricky (human cataloging is only about 80% ‘accurate’ - relevance is very hard to quantify)
  • 12. Zipf Best Bets • Manually selected results, tied to specific query terms or phrases • User-driven phrases select the most-used phrases from search traffic; go for easy wins, because returns diminish sharply • Business-driven phrases select phrases important to the business; such as product names or office locations; or politically sensitive phrases, so you can control the message people see 12
  • 13. Relevance Feedback • The user provides direct or indirect feedback on the search results • Click tracking • “More like this” or “Find similar” • Clustering 13
  • 14. Structured Search • Designers use patterns in search behavior to guess user’s intent; this requires a substantial understanding of user behavior; it may require structured content (although, not necessarily) Examples • Zip Code -> Zip Code Lookup Tool • Person’s name -> Directory Listing • Product Name -> Shop or Support? • Address -> Map this? • Topic -> Introduction, Forms, Policies or Reports? 14
  • 15. Controlled Vocabularies • Classification with a controlled vocabulary is the best way to ensure 100% Recall • Lead-in synonyms enter “fridge”; get “refrigerator” instead; best if the collection is well-cataloged increases precision (e.g. in a library) • Term-expansion synonyms; enter “refrigerator”; get “fridge” too; best if the collection is not well-cataloged increases recall at the cost of precision (e.g on eBay) • Spell check on query phrases 15
  • 16. Why is search important? IF: About half of all users prefer to search first* THEN: What percentage of a content site’s development effort should be devoted to search? * This statistic is highly context-dependent. People’s behavior depends on the context of their actions. The stat is from Jared Spool. 16
  • 17. Questions? James Melzer Information Architect SRA International james_melzer@sra.com 17