SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
Presentation Description ,[object Object],[object Object],[object Object]
After This Presentation Users Will Be Able To: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background for Dan McCreary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Many People… ,[object Object],[object Object],[object Object],[object Object],[object Object]
Structured Search ,[object Object],[object Object],But first a story……
Information Retrieval Textbook ,[object Object],[object Object],[object Object],http://nlp.stanford.edu/IR-book/information-retrieval-book.html
117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
Table 10.1 XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
Excerpt from IR Book… ,[object Object],[object Object],[object Object]
eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
Presentation Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Table 10.1 - Revised XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
Relational DB Boolean Search ,[object Object],[object Object],[object Object],SELECT * FROM PERSON WHERE TITLE = 'manager' ORDER BY SALARY Note that the "order" is not the quality of a mach but another column in a table
Vector Model ,[object Object],[object Object],Your search keyword (green) Other documents (blue) Search score is distance measurement (red) Keyword 1 Keyword 2
Reverse Index For each word, a reverse index tells you what documents contain that word.  Word Document IDs hate 12344, 34235, 43513,  love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
Reverse Index in eXist 1.5 Terms that start with "love"
Sample Keyword Search ,[object Object],[object Object],Keyword Search: Resulting Hits: Code (XQuery):
Calculating Score ,[object Object],[object Object],[object Object],[object Object],[object Object]
How is "Structured Search" Different? ,[object Object],[object Object],[object Object]
Two Models ,[object Object],[object Object],[object Object],[object Object],[object Object],'love' 'hate' 'new' 'fear' keywords keywords keywords keywords keywords keywords doc-id
Keywords and Node IDs ,[object Object],Node-id Node-id Node-id Node-id Node-id Node-id keywords keywords keywords keywords keywords keywords document-id
Subdocuments ,[object Object],[object Object],[object Object],[object Object]
Books Have Structure Book Title Book Metadata
Presentations Have Structure Find all slides with the word "XML" in their title ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E-mail has structure ,[object Object],[object Object],[object Object],[object Object]
Sample of XML ,[object Object]
Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
But What About Microsoft Office? ,[object Object],Office Open XML (also informally known as OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft  for representing spreadsheets, charts, presentations  and word processing documents.  File extensions: .docx, .xlsx, .pptx are zipped folders that contain XML files  ECMA-376
Open Document XML Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Benefits ,[object Object],[object Object],[object Object],Target documents Other documents Actual Search Results  Missed Document
Results from Studies ,[object Object],[object Object],[object Object],[object Object],Source: INEX 2003/2004 "Bag-of-words" vs. "full structure"
Tibetan Buddhist Resource Center (TBRC) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Woodruff Library, Emory University ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Challenges ,[object Object],[object Object],[object Object],[object Object]
Getting Data into XML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What to Return in a Hit ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Allow searchers to specify level with search options ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Testing Structured Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Structured Search Project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Queries ,[object Object],Find all "SPEECH" elements that contains the keyword 'love' (predicate or "WHERE" clause)
Near Operator ,[object Object],[object Object]
Skillsets Needed for Pilot Project ,[object Object],[object Object],[object Object],[object Object]
Predictions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps to Run Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Configuration File ,[object Object],Example of boost on title
XQuery Fulltext
XQuery/Lucene Search Wikibook
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Send e-mail to dan@danmccreary.com for extended list of "getting started" resources.
Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198

Weitere ähnliche Inhalte

Was ist angesagt?

Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked DataJuan Sequeda
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLConnected Data World
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryNeo4j
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge GraphLukas Masuch
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisNeo4j
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
TehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseTehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseHamoon Mohammadian Pour
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked DataAntoine Isaac
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 

Was ist angesagt? (20)

Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked Data
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
TehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseTehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph Database
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked Data
 
Graph db
Graph dbGraph db
Graph db
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 

Andere mochten auch

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwordssmgibbs
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & AcronymsBrian Johnson
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big DataDataWorks Summit
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)weiw_oz
 
PayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwarePayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwareRanganath Shivaram
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsDKALab
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCLJisc
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsCloudera, Inc.
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture Tony Ng
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchCloudera, Inc.
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentationAndrada Astefanoaie
 

Andere mochten auch (16)

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwords
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 
Ebay search
Ebay searchEbay search
Ebay search
 
Presentation
PresentationPresentation
Presentation
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
PayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwarePayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP Software
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentation
 

Ähnlich wie Structured Document Search and Retrieval

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engineankur881120
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkEarley Information Science
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14Steven Toole
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with searchJean Graef
 
Search for Clarify/Dovetail
Search for Clarify/DovetailSearch for Clarify/Dovetail
Search for Clarify/DovetailGary Sherman
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customersrichwig
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Christopher Biow
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 

Ähnlich wie Structured Document Search and Retrieval (20)

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Search for Clarify/Dovetail
Search for Clarify/DovetailSearch for Clarify/Dovetail
Search for Clarify/Dovetail
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 

Mehr von Optum

Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With SasOptum
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMOptum
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic WebOptum
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic IntegrationOptum
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative SystemsOptum
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGOptum
 

Mehr von Optum (6)

Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With Sas
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUG
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Structured Document Search and Retrieval

  • 1. Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. 117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
  • 9. Table 10.1 XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
  • 10.
  • 11. eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
  • 12.
  • 13. Table 10.1 - Revised XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
  • 14.
  • 15.
  • 16. Reverse Index For each word, a reverse index tells you what documents contain that word. Word Document IDs hate 12344, 34235, 43513, love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
  • 17. Reverse Index in eXist 1.5 Terms that start with "love"
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Books Have Structure Book Title Book Metadata
  • 25.
  • 26.
  • 27.
  • 28. Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 49.
  • 50.
  • 51. Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198