SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Trying to predict search results using enterprise
search tool with out of the box settings
 Understand enterprise search capabilities at a granular
level
 From a users perspective
 Address general queries including…
 Does position of a word in a document weigh heavier
than amount of times a word shows
 Does singular vs. plural affect results / rank?
 Does lemmatization work? Will a search on “good” pick
up documents with the term “better” or “best”?
 No obvious logic determining how documents are ranked
 Relevance order changes when filters are applied
 Additional filters or order in which filters are applied, does not change relevancy.
 Relevance order changes as additional documents are added
 Document name and SharePoint fields weighed heavier than terms in the
document
 The amount of word hits rank higher than amount of words per word hit
 Document with “dog” 20 times in an 80 word document (25%) is more relevant than a document
with dog 15 times in a 15 word document (100%)
 Case sensitivity / Casedex is not a factor to find documents, but could be a factor
on relevance
 Natural wildcarding is not present
 “Dog” does not pick up “dogma”, “cat” does not pickup “catholic”
 No synonym matching / no thesaurus
 “Canine” and “K9” did not bring up documents with the word “dog”
 Lemmatization is present but erratic
 “Good”, “better” and “best” hit on document with “better” in the text. All missed document with
“good“ in text even though other document results highlighted the “terms”, “good”, ”goods”, “better”
and “best”.
 Search within Search does not exist
 Basic wildcard searching worked appropriately (asterisk [*]
at end of word)
 Advanced Wildcard did not work
 When searching on singular, plurals were not always
brought back (‘dog’ brought back some docs with ‘dogs’
but missed some with ‘dogs’)
 When searching on singular, sometimes plurals had higher
relevancy
 Searching on plural did not bring back singular (‘dogs’
never brought back docs with ‘dog’)
 Misspelled words in document were not picked up
 Identical documents in different formats came back in the
order of : .doc, .docx, .pdf, .xls, .xlsx
Enterpris Search Engine
Search within search No
Lemmatization dog does not pick up synonyms, better picks up good and best, warm does not pick up hot, USB does not pick up
Universal Serial Bus, car does not pick up automobile
Stemming Searches singular and plural, but not lemma
ie. dog picks up dog and dogs, but not dogma
Case sensitivity / Casedex Does not impact results
SharePoint Fields Weighs SP Title Description then comments, heavily
Multi-word search Default as "and"
Boolean Must be capitalized… includes different cases (singular vs plural)
OR dog OR cat, cat OR dog, comes back in a different order
NEAR Yes… includes singular and plural case
WITHIN No
AND dog AND cat, cat AND dog return same documents in different relevancy
NOT Yes... excluded word cannot be in text, title, doc name, description, comments, etc
Wildcard matching Yes
Advanced wildcard No
anti-phrasing Searches your request but asks, did you mean
Typo / Levenshtein No
Accent normalization Search with all variations of words, picks up all variations of words
Periods within Abbreviations usb pulls up usb and u.s.b. but u.s.b. does not pull up usb
Abbreviations No
Dates / Entity normalization See appendix
Punctuation variations See appendix
Soundex / sound alike No
Thesaurus / synonym matching No
Duplicates Modifies search result to show 1 document(def.docx) and show there is another document with the same information and
allows you
Enterprise Search Engine
 Weighting (Heaviest to lightest)
 Document Name
 SharePoint Title
 SharePoint Description
 SharePoint Comments vs. amount of word hits
in a document
 Amount of hits on a word
* Create Date, modified date, upload date, crawl order not fully factored
Starts to
become
harder to
predict
ranking
Document Details Original Search:
dog
After
applying
filter
After adding
documents
11 words, 9th word “dog” 5 8 11 (7)
11 words, 9th word “dogs” 8 6 12 (8)
10 words, “Cat”, doc name dog.docx 1 1 1 (1)
10 words, all “dog” 3 3 6 (3)
10 words, Cat, 4th replaced w/ ”Dog” 7 5 10 (6)
10 words Cat, 10th word replaced w/ ”Dog” 6 7 9 (5)
1 word “farm” SP Description = “dog” 2 2 2 (2)
10 words “Catholic”, 4th replaced w/”Dogma” X X X
2 words, SP Comments contains 5 words, all
“dog”
4 4 7 (4)
15 words, all “dog”
These documents were not added until
the following day
4
20 phrases, “I am a dog” (80 words, 20 words
“dog”)
3
12 words, all “dog” 5
12 words, all “Dog” 8
12 words, all “dogs” X
12 words, all “Dogs” X
The number signifies the rank.
For 11 (7), the 11 is the overall
rank and the 7 is the rank
amongst the original 9 docs
Search Term: Good Better Best
Document containing “Good” x  x
Document containing “Better” x  
Document containing “Best” x  
Search Term: goose geese Goose Geese gooses geeses
Document
Contains…
Goose 2 X 2 X X X
Geese 1 1 1 1 X X
Search Term: chicken chickin chikcen
Document Contains…
chikcen X X 1
chickin X 1 X
Date Formats
Search: 6/21/14 6/21/2014
June 21st,
2014
21 June
2014
June 21,
2014
06/21/14 6-21-14 06-21-2014 6-21-2014 21-Jun-14
21-Jun-
2014
Doc
contains:
June 21,
2014
X X X 3 3 X X X X X X
6/21/14 1 X X X X X 1 X X X X
06-21-14 X X X X X X X 1 X X X
June 21st,
2014
X X 1 1 1 X X X X X X
21-Jun-14 X X X 2 2 X X X X X X
Special Characters
Search: PS3 PS/3 PS 3 PS-3
Doc contains:
PS/3 X 3 1 3
PS-3 X 2 4 2
PS3 1 X X X
PS 3 X 4 2 4
Ps-3 X 1 3 1
Special Characters cont.
Search: AB12345 AB 12345 AB.12345 AB-12345
Doc contains:
AB-12345 X 2 2 2
AB 12345 X 1 1 1
Enterprise search results predictability analysis

Weitere ähnliche Inhalte

Andere mochten auch

The Business Case for Enterprise Search
The Business Case for Enterprise SearchThe Business Case for Enterprise Search
The Business Case for Enterprise SearchRBC
 
Social Media Strategy Tools 2013
Social Media Strategy Tools 2013 Social Media Strategy Tools 2013
Social Media Strategy Tools 2013 Pusher
 
Optimising content for search vs social
Optimising content for search vs socialOptimising content for search vs social
Optimising content for search vs socialKing Content
 
Web 2.0 Tools for Collaboration and Outreach
Web 2.0 Tools for Collaboration and OutreachWeb 2.0 Tools for Collaboration and Outreach
Web 2.0 Tools for Collaboration and OutreachDebbie Herman
 
mediaTalk: Web 2.0 Tools for Collaboration
mediaTalk:  Web 2.0 Tools for CollaborationmediaTalk:  Web 2.0 Tools for Collaboration
mediaTalk: Web 2.0 Tools for CollaborationDr. Modeane Walker
 
Cool Tools for Successful Collaboration
Cool Tools for Successful CollaborationCool Tools for Successful Collaboration
Cool Tools for Successful CollaborationJoanna Sanders
 
50+ Social Media Tools and Sites: Beyond Facebook and Twitter
50+ Social Media Tools and Sites: Beyond Facebook and Twitter50+ Social Media Tools and Sites: Beyond Facebook and Twitter
50+ Social Media Tools and Sites: Beyond Facebook and TwitterJeremy Caplan
 
Dreamforce14 Multi Org Collaboration Architecture
Dreamforce14  Multi Org Collaboration ArchitectureDreamforce14  Multi Org Collaboration Architecture
Dreamforce14 Multi Org Collaboration ArchitectureRichard Clark
 
Web 2.0 Collaboration Tools: A Quick Guide
Web 2.0 Collaboration Tools: A Quick GuideWeb 2.0 Collaboration Tools: A Quick Guide
Web 2.0 Collaboration Tools: A Quick GuideMohamed Amin Embi
 
TCS: Ad Tools, Social Marketing Tools + List Building Tools
TCS: Ad Tools, Social Marketing Tools + List Building ToolsTCS: Ad Tools, Social Marketing Tools + List Building Tools
TCS: Ad Tools, Social Marketing Tools + List Building ToolsRoland Frasier
 
The dumbing down of intelligent search
The dumbing down of intelligent searchThe dumbing down of intelligent search
The dumbing down of intelligent searchEric Reiss
 
Search Content vs. Social Content
Search Content vs. Social ContentSearch Content vs. Social Content
Search Content vs. Social ContentSemrush
 

Andere mochten auch (14)

The Business Case for Enterprise Search
The Business Case for Enterprise SearchThe Business Case for Enterprise Search
The Business Case for Enterprise Search
 
Social Media Strategy Tools 2013
Social Media Strategy Tools 2013 Social Media Strategy Tools 2013
Social Media Strategy Tools 2013
 
Optimising content for search vs social
Optimising content for search vs socialOptimising content for search vs social
Optimising content for search vs social
 
Web 2.0 Tools for Collaboration and Outreach
Web 2.0 Tools for Collaboration and OutreachWeb 2.0 Tools for Collaboration and Outreach
Web 2.0 Tools for Collaboration and Outreach
 
mediaTalk: Web 2.0 Tools for Collaboration
mediaTalk:  Web 2.0 Tools for CollaborationmediaTalk:  Web 2.0 Tools for Collaboration
mediaTalk: Web 2.0 Tools for Collaboration
 
Cool Tools for Successful Collaboration
Cool Tools for Successful CollaborationCool Tools for Successful Collaboration
Cool Tools for Successful Collaboration
 
50+ Social Media Tools and Sites: Beyond Facebook and Twitter
50+ Social Media Tools and Sites: Beyond Facebook and Twitter50+ Social Media Tools and Sites: Beyond Facebook and Twitter
50+ Social Media Tools and Sites: Beyond Facebook and Twitter
 
Dreamforce14 Multi Org Collaboration Architecture
Dreamforce14  Multi Org Collaboration ArchitectureDreamforce14  Multi Org Collaboration Architecture
Dreamforce14 Multi Org Collaboration Architecture
 
Web 2.0 Collaboration Tools: A Quick Guide
Web 2.0 Collaboration Tools: A Quick GuideWeb 2.0 Collaboration Tools: A Quick Guide
Web 2.0 Collaboration Tools: A Quick Guide
 
Social media 2013
Social media 2013Social media 2013
Social media 2013
 
TCS: Ad Tools, Social Marketing Tools + List Building Tools
TCS: Ad Tools, Social Marketing Tools + List Building ToolsTCS: Ad Tools, Social Marketing Tools + List Building Tools
TCS: Ad Tools, Social Marketing Tools + List Building Tools
 
The dumbing down of intelligent search
The dumbing down of intelligent searchThe dumbing down of intelligent search
The dumbing down of intelligent search
 
Search Content vs. Social Content
Search Content vs. Social ContentSearch Content vs. Social Content
Search Content vs. Social Content
 
Social Media for Business
Social Media for BusinessSocial Media for Business
Social Media for Business
 

Ähnlich wie Enterprise search results predictability analysis

Content Findability in a Portable Content World
Content Findability in a Portable Content WorldContent Findability in a Portable Content World
Content Findability in a Portable Content WorldLise Kreps
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.pptBereketAraya
 
Google tips and tricks (1)
Google  tips and tricks (1)Google  tips and tricks (1)
Google tips and tricks (1)Chris Hall
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Brian Nauheimer
 
Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Nadeem Sohail
 
Easton Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 PresentationEaston Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 Presentationjana1954
 
Online Search Strategies Child Dev 105 Jones
Online Search Strategies Child Dev 105 JonesOnline Search Strategies Child Dev 105 Jones
Online Search Strategies Child Dev 105 JonesTalitha Matlin
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Psyc INFO database presentation
Psyc INFO database presentationPsyc INFO database presentation
Psyc INFO database presentationNina Collins
 
Google search tips
Google search tipsGoogle search tips
Google search tipsE Robertson
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idcSuchittaU
 
Searching In SharePoint
Searching In SharePointSearching In SharePoint
Searching In SharePointThomas Duff
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
Phrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information RetrivelPhrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information Retrivelbalaabirami
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexingbalaabirami
 
Communication Studies 130
Communication Studies 130Communication Studies 130
Communication Studies 130Tiffini Travis
 

Ähnlich wie Enterprise search results predictability analysis (20)

Content Findability in a Portable Content World
Content Findability in a Portable Content WorldContent Findability in a Portable Content World
Content Findability in a Portable Content World
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt
 
Google tips and tricks (1)
Google  tips and tricks (1)Google  tips and tricks (1)
Google tips and tricks (1)
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020
 
Electronic Databases
Electronic DatabasesElectronic Databases
Electronic Databases
 
Cap 233 how search works
Cap 233 how search worksCap 233 how search works
Cap 233 how search works
 
Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...
 
Easton Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 PresentationEaston Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 Presentation
 
Online Search Strategies Child Dev 105 Jones
Online Search Strategies Child Dev 105 JonesOnline Search Strategies Child Dev 105 Jones
Online Search Strategies Child Dev 105 Jones
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Psyc INFO database presentation
Psyc INFO database presentationPsyc INFO database presentation
Psyc INFO database presentation
 
Advanced Search Tools
Advanced Search ToolsAdvanced Search Tools
Advanced Search Tools
 
Google search tips
Google search tipsGoogle search tips
Google search tips
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idc
 
Searching In SharePoint
Searching In SharePointSearching In SharePoint
Searching In SharePoint
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
Phrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information RetrivelPhrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information Retrivel
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexing
 
Communication Studies 130
Communication Studies 130Communication Studies 130
Communication Studies 130
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 

Kürzlich hochgeladen

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Enterprise search results predictability analysis

  • 1. Trying to predict search results using enterprise search tool with out of the box settings
  • 2.  Understand enterprise search capabilities at a granular level  From a users perspective  Address general queries including…  Does position of a word in a document weigh heavier than amount of times a word shows  Does singular vs. plural affect results / rank?  Does lemmatization work? Will a search on “good” pick up documents with the term “better” or “best”?
  • 3.  No obvious logic determining how documents are ranked  Relevance order changes when filters are applied  Additional filters or order in which filters are applied, does not change relevancy.  Relevance order changes as additional documents are added  Document name and SharePoint fields weighed heavier than terms in the document  The amount of word hits rank higher than amount of words per word hit  Document with “dog” 20 times in an 80 word document (25%) is more relevant than a document with dog 15 times in a 15 word document (100%)  Case sensitivity / Casedex is not a factor to find documents, but could be a factor on relevance  Natural wildcarding is not present  “Dog” does not pick up “dogma”, “cat” does not pickup “catholic”  No synonym matching / no thesaurus  “Canine” and “K9” did not bring up documents with the word “dog”  Lemmatization is present but erratic  “Good”, “better” and “best” hit on document with “better” in the text. All missed document with “good“ in text even though other document results highlighted the “terms”, “good”, ”goods”, “better” and “best”.  Search within Search does not exist
  • 4.  Basic wildcard searching worked appropriately (asterisk [*] at end of word)  Advanced Wildcard did not work  When searching on singular, plurals were not always brought back (‘dog’ brought back some docs with ‘dogs’ but missed some with ‘dogs’)  When searching on singular, sometimes plurals had higher relevancy  Searching on plural did not bring back singular (‘dogs’ never brought back docs with ‘dog’)  Misspelled words in document were not picked up  Identical documents in different formats came back in the order of : .doc, .docx, .pdf, .xls, .xlsx
  • 5. Enterpris Search Engine Search within search No Lemmatization dog does not pick up synonyms, better picks up good and best, warm does not pick up hot, USB does not pick up Universal Serial Bus, car does not pick up automobile Stemming Searches singular and plural, but not lemma ie. dog picks up dog and dogs, but not dogma Case sensitivity / Casedex Does not impact results SharePoint Fields Weighs SP Title Description then comments, heavily Multi-word search Default as "and" Boolean Must be capitalized… includes different cases (singular vs plural) OR dog OR cat, cat OR dog, comes back in a different order NEAR Yes… includes singular and plural case WITHIN No AND dog AND cat, cat AND dog return same documents in different relevancy NOT Yes... excluded word cannot be in text, title, doc name, description, comments, etc Wildcard matching Yes Advanced wildcard No anti-phrasing Searches your request but asks, did you mean Typo / Levenshtein No Accent normalization Search with all variations of words, picks up all variations of words Periods within Abbreviations usb pulls up usb and u.s.b. but u.s.b. does not pull up usb Abbreviations No Dates / Entity normalization See appendix Punctuation variations See appendix Soundex / sound alike No Thesaurus / synonym matching No Duplicates Modifies search result to show 1 document(def.docx) and show there is another document with the same information and allows you
  • 6.
  • 7. Enterprise Search Engine  Weighting (Heaviest to lightest)  Document Name  SharePoint Title  SharePoint Description  SharePoint Comments vs. amount of word hits in a document  Amount of hits on a word * Create Date, modified date, upload date, crawl order not fully factored Starts to become harder to predict ranking
  • 8. Document Details Original Search: dog After applying filter After adding documents 11 words, 9th word “dog” 5 8 11 (7) 11 words, 9th word “dogs” 8 6 12 (8) 10 words, “Cat”, doc name dog.docx 1 1 1 (1) 10 words, all “dog” 3 3 6 (3) 10 words, Cat, 4th replaced w/ ”Dog” 7 5 10 (6) 10 words Cat, 10th word replaced w/ ”Dog” 6 7 9 (5) 1 word “farm” SP Description = “dog” 2 2 2 (2) 10 words “Catholic”, 4th replaced w/”Dogma” X X X 2 words, SP Comments contains 5 words, all “dog” 4 4 7 (4) 15 words, all “dog” These documents were not added until the following day 4 20 phrases, “I am a dog” (80 words, 20 words “dog”) 3 12 words, all “dog” 5 12 words, all “Dog” 8 12 words, all “dogs” X 12 words, all “Dogs” X The number signifies the rank. For 11 (7), the 11 is the overall rank and the 7 is the rank amongst the original 9 docs
  • 9. Search Term: Good Better Best Document containing “Good” x  x Document containing “Better” x   Document containing “Best” x  
  • 10. Search Term: goose geese Goose Geese gooses geeses Document Contains… Goose 2 X 2 X X X Geese 1 1 1 1 X X
  • 11. Search Term: chicken chickin chikcen Document Contains… chikcen X X 1 chickin X 1 X
  • 12. Date Formats Search: 6/21/14 6/21/2014 June 21st, 2014 21 June 2014 June 21, 2014 06/21/14 6-21-14 06-21-2014 6-21-2014 21-Jun-14 21-Jun- 2014 Doc contains: June 21, 2014 X X X 3 3 X X X X X X 6/21/14 1 X X X X X 1 X X X X 06-21-14 X X X X X X X 1 X X X June 21st, 2014 X X 1 1 1 X X X X X X 21-Jun-14 X X X 2 2 X X X X X X
  • 13. Special Characters Search: PS3 PS/3 PS 3 PS-3 Doc contains: PS/3 X 3 1 3 PS-3 X 2 4 2 PS3 1 X X X PS 3 X 4 2 4 Ps-3 X 1 3 1 Special Characters cont. Search: AB12345 AB 12345 AB.12345 AB-12345 Doc contains: AB-12345 X 2 2 2 AB 12345 X 1 1 1