SlideShare ist ein Scribd-Unternehmen logo
1 von 86
Jurriaan Persyn
@oemebamo – CTO Engagor
SELECT *
FROM myauwesomewebsite
   WHERE `text` LIKE
      „%shizzle%‟
TEXT SEARCH WITH PHP/MYSQL
• FULLTEXT index
  • Only for CHAR, VARCHAR & TEXT columns
    • For MyISAM & InnoDB tables
  • Configurable Stop Words
  • Types:
   • Natural Language
   • Natural Language with Query Expansion
   • Boolean Full Text
MYSQL FULLTEXT BOOLEAN MODE
Operators:
 •+          AND
 •-          NOT
 •           OR implied
 •()         Nesting
 •*          Wildcard
 •“          Literal matches
TEXT SEARCH WITH PHP/MYSQL (CONT‟D)
• Typical columns for search table:
  • Type
  • Id
  • Text
  • Priority

• Process:
  • Blog posts, comments, …
    • Save (filtered) duplicate of text in search table.
  • When searching …
    • Search table and translate to original data via type/id

This is how most php/mysql sites implement their search, right?
SELECT *
FROM mysearchtable
WHERE MATCH(text)
 AGAINST („shizzle‟)
SELECT *
    FROM mysearchtable
    WHERE MATCH(text)
  AGAINST („+shizzle –”ma
nizzle”‟ IN BOOLEAN MODE)
SELECT * FROM jobs
WHERE role = „DEVELOPER‟
AND MATCH(job_description)
   AGAINST („node.js‟)
SELECT * FROM jobs j
JOIN jobs_benefits jb ON j.id =
           jb.job_id
WHERE j.role = „DEVELOPER‟
AND (MATCH(job_description)
 AGAINST („node.js -asp‟ IN
     BOOLEAN MODE)
AND jb.free_espresso = TRUE
WHAT IS A SEARCH ENGINE?
• Efficient indexing of data
  • On all fields / combination of fields
• Analyzing data
  • Text Search
     • Tokenizing
     • Stemming
     • Filtering
  • Understanding locations
  • Date parsing
• Relevance scoring
TOKENIZING
• Finding word boundaries
  • Not just explode(„ „, $text);
  • Chinese has no spaces. (Not every single character is a
    word.)
• Understand patterns:
  • URLs
  • Emails
  • #hashtags
  • Twitter @mentions
  • Currencies (EUR, €, …)
STEMMING
• “Stemming is the process for reducing inflected (or sometimes
  derived) words to their stem, base or root form.”
   • Conjugations
   • Plurals
• Example:
   • Fishing, Fished, Fish, Fisher > Fish
   • Better > Good
• Several ways to find the stem:
   • Lookup tables
   • Suffix-stripping
   • Lemmatization
   •…
• Different stemmers for every language.
FILTERING
• Remove stop words
  • Different for every language
• HTML
  • If you‟re indexing web content, not every character is
    meaningful.
UNDERSTANDING LOCATIONS
• Reverse geocoding of locations to longitude & latitude
• Search on location:
  • Bounding box searches
  • Distance searches
    • Searching nearby
  • Geo Polygons
    • Searching a country

(Note: MySQL also has geospatial indeces.)
RELEVANCE SCORING
• From the matched documents, which ones do you show first?
• Several strategies:
  • How many matches in document?
  • How many matches in document as percentage of length?
  • Custom scoring algorithms
    • At index time
    • At search time
  • … A combination

Think of Google PageRank.
“There‟s an app software for
that.”
APACHE LUCENE
•   “Information retrieval software library”
•   Free/open source
•   Supported by Apache Foundation
•   Created by Doug Cutting
•   Written in 1999
“There‟s software a Java library
for that.”
ELASTICSEARCH
• “You know, for Search”
• Also Free & Open Source
• Built on top of Lucene
• Created by Shay Banon @kimchy
• Versions
   • First public release, v0.4 in February 2010
     • A rewrite of earlier “Compass” project, now with scalability
       built-in from the very core
   • Now stable version at 0.20.6
   • Beta branch at 0.90 (working towards 1.0 release)
• In Java, so inherently cross-platform
WHAT DOES IT ADD TO LUCENE?
• RESTfull Service
  • JSON API over HTTP
  • Want to use it from PHP?
     • CURL Requests, as if you‟d do requests to the Facebook
       Graph API.
• High Availability & Performance
  • Clustering
• Long Term Persistency
  • Write through to persistent storage system.
$ cd ~/Downloads
$ wget https://github.com/…/elasticsearch-0.20.5.tar.gz
$ tar –xzf elasticsearch-0.20.5.tar.gz
$ cd elasticsearch-0.20.5/
$ ./bin/elasticsearch
$ cd ~/Downloads
$ wget https://github.com/…/elasticsearch-0.20.5.tar.gz
$ tar –xzf elasticsearch-0.20.5.tar.gz
$ git clone https://github.com/elasticsearch/elasticsearch-
servicewrapper.git elasticsearch-servicewrapper
$ sudo mv elasticsearch-0.20.5 /usr/local/share
$ cd elasticsearch-servicewrapper
$ sudo mv service /usr/local/share/elasticsearch-0.20.5/bin
$ cd /usr/local/share
$ sudo ln -s elasticsearch-0.20.5 elasticsearch
$ sudo chown -R root:wheel elasticsearch
$ cd /usr/local/share/elasticsearch
$ sudo bin/service/elasticsearch start
$ sudo bin/service/elasticsearch start
Starting ElasticSearch...
Waiting for ElasticSearch...
.
.
.
running: PID:83071
$
$ curl -XPUT http://localhost:9200/test/stupid-hypes/planking
-d '{"name":"Planking", "stupidity_level":"5"}'

{"ok":true,"_index":"test","_type":"stupid-
hypes","_id":"planking","_version":1}
$ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon-
smashing -d '{"name":"Gallon Smashing",
"stupidity_level":"5"}'

{"ok":true,"_index":"test","_type":"stupid-
hypes","_id":"gallon-smashing","_version":1}
$ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon-
smashing -d '{"name":"Gallon Smashing",
"stupidity_level":"10"}'

{"ok":true,"_index":"test","_type":"stupid-
hypes","_id":"gallon-smashing","_version":2}
$ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon-
smashing -d '{"name":"Gallon Smashing",
"stupidity_level":"10", "lifetime":30}’

{"ok":true,"_index":"test","_type":"stupid-
hypes","_id":"gallon-smashing","_version":3}
SCHEMALESS, DOCUMENT ORIENTED
• No need to configure schema upfront
• No need for slow ALTER TABLE –like operations
• You can define a mapping (schema) to customize the indexing
  process
  • Require fields to be of certain type
  • If you want text fields that should not be analyzed (stemming,
    …)
“Ok, so it‟s a NoSQL store?”
TERMINOLOGY

MySQL                   Elastic Search
Database                Index
Table                   Type
Row                     Document
Column                  Field
Schema                  Mapping
Index                   Everything is indexed
SQL                     Query DSL
SELECT * FROM table …   GET http://…
UPDATE table SET …      PUT http://…
DISTRIBUTED & HIGHLY AVAILABLE
• Multiple servers (nodes) running in a cluster
  • Acting as single service
  • Nodes in cluster that store data or nodes that just help in
    speeding up search queries.
• Sharding
  • Indeces are sharded (# shards is configurable)
  • Each shard can have zero or more replicas
     • Replicas on different servers (server pools) for failover
     • One in the cluster goes down? No problem.
• Master
  • Automatic Master detection + failover
  • Responsible for distribution/balancing of shards
SCALING ISSUES?
• No need for an external load balancer
  • Since cluster does it‟s own routing.
  • Ask any server in the cluster, it will delegate to correct node.

• What if …
  • More data             >       More shards.
  • More availability     >       More replicas per shard.
PERFORMANCE TWEAKING
• Bulk Indexing
• Multi-Get
  • Avoids network latency (HTTP Api)
• Api with administrative & monitoring interface
  • Cluster‟s availability state
  • Health
  • Nodes‟ memory footprint
• Alternatives voor HTTP Api?
  • Java library
  • PHP wrappers (Sherlock, Elastica, …)
     • But simplicity of HTTP Api is brilliant to work with, latency is
       hardly an issue.
Still with me?
Some Examples
Query DSL Example:

(language:nl OR location.country:be OR location.country:aa)
(tag:sentiment.negative) author.followers:[1000 TO *] (-
sub_category:like) ((-status:857.assigned) (-status:857.done))
FACETS
• Instead of returning the matching documents …
• … return data about the distribution of values in the set of
  matching documents
   • Or a subset of the matching documents
• Possibilities:
   • Totals per unique value
   • Averages of values
   • Distributions of values
   •…
TERMINOLOGY (CONT‟D)

MySQL                       Elastic Search
SELECT field, COUNT(*)      Facet
FROM table GROUP BY field
ADVANCED FEATURES
• Nested documents (Child-Parent)
   • Like MySQL joins?
• Percolation Index
   • Store queries in Elastic
   • Send it documents
   • Get returned which queries match
• Index Warming
   • Register search queries that cause heavy load
   • New data added to index will be warmed
   • So next time query is executed: pre cached
WHAT ARE MY OTHER OPTIONS?
• RDBMS
  • MySQL, …
• NoSQL
  • MongoDB, …
• Search Engines
  • Solr
  • Sphinx
  • Xapian
  • Lucene itself
• SaaS
  • Amazon CloudSearch
… VS. SOLR
•+
  • Also built on Lucene
    • So similar feature set
    • Also exposes Lucene functionality, like Elastic Search, so
      easy to extend.
  • A part of Apache Lucene project
  • Perfect for Single Server search
•-
  • Clustering is there. But it‟s definitely not as simple as
    ElasticSearch‟
  • Fragmented code base. (Lots of branches.)


Engagor used to run on Solr.
… VS. SPHINX
•+
  • Great for single server full text searches;
  • Has graceful integration with SQL database;
    • (Eg. for indexing data)
  • Faster than the others for simple searches;
•-
  • No out of the box clustering;
  • Not built on Lucene; lacks some advanced features;


Netlog & Twoo use Sphinx.
WANT TO USE IT?
• In an existing project:
   • As an extra layer next to your data …
     • Send to both your database & elasticsearch;
     • Consistency problems?;
   • Or as replacement for database
     • Elastic is as persistent as MySQL;
     • If you don‟t need RDBMS features;
        • @Engagor: Our social messages are only in Elastic
“Users are incredibly bad at
finding and researching things
on the web.”
                         Nielsen (March 2013)
       http://www.nngroup.com/articles/search-navigation/
“Pathetic and useless are
words that come to mind after
this year‟s user testing.”
                         Nielsen (March 2013)
       http://www.nngroup.com/articles/search-navigation/
“I‟m searching for apples and
pears.”
“apples AND pears”
“apples OR pears”
“It‟s too young. Is it even stable
enough?”
          Your boss (Tomorrow Morning)
elasticsearch.org

irc.freenode.net
   #elasticsearch
elasticsearch
          node.js
         socket.io
  real time notifications
         rabbitmq
       backbone.js
         gearman

jobs@engagor.com
$ cd /usr/local/share/elasticsearch
$ sudo bin/service/elasticsearch stop
Sources include:
•    http://www.elasticsearch.org/videos/2010/02/07/es-introduction.html
•    http://www.elasticsearchtutorial.com/
•    http://www.slideshare.net/clintongormley/cool-bonsai-cool-an-introduction-to-elasticsearch
•    http://www.slideshare.net/medcl/elastic-search-quick-intro
•    http://www.slideshare.net/macrochen/elastic-search-apachesolr-10881377
•    http://www.slideshare.net/cyber_jso/elastic-search-introduction
•    http://www.slideshare.net/infochimps/elasticsearch-v4
•    http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-elastic-search-and-on-a-related-note-slashem-also-
     works-with-elastic-search/
•    http://stackoverflow.com/questions/10213009/solr-vs-elasticsearch
•    http://stackoverflow.com/questions/11115523/how-does-amazon-cloudsearch-compares-to-elasticsearch-solr-or-sphinx-
     in-terms-o
•    http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search medcl
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchIsmaeel Enjreny
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseRicha Budhraja
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudJoe Ryan
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search WalkthroughSuhel Meman
 
엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나종민 김
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearchAnton Udovychenko
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 

Was ist angesagt? (20)

Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and Cloud
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나
 
Elk
Elk Elk
Elk
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 

Ähnlich wie An Introduction to Elastic Search.

Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseMatthias Wahl
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search enginesMikko Huilaja
 
PostgreSQL, your NoSQL database
PostgreSQL, your NoSQL databasePostgreSQL, your NoSQL database
PostgreSQL, your NoSQL databaseReuven Lerner
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databaseBarry Jones
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018Roy Russo
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)Timon Vonk
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...OpenBlend society
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 

Ähnlich wie An Introduction to Elastic Search. (20)

Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational Database
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
 
PostgreSQL, your NoSQL database
PostgreSQL, your NoSQL databasePostgreSQL, your NoSQL database
PostgreSQL, your NoSQL database
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Mathias test
Mathias testMathias test
Mathias test
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 

Mehr von Jurriaan Persyn

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At NetlogJurriaan Persyn
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at NetlogJurriaan Persyn
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogJurriaan Persyn
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Jurriaan Persyn
 

Mehr von Jurriaan Persyn (7)

Engagor Walkthrough
Engagor WalkthroughEngagor Walkthrough
Engagor Walkthrough
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At Netlog
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at Netlog
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: Netlog
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

An Introduction to Elastic Search.

  • 1.
  • 2.
  • 4.
  • 5.
  • 6.
  • 7. SELECT * FROM myauwesomewebsite WHERE `text` LIKE „%shizzle%‟
  • 8.
  • 9.
  • 10. TEXT SEARCH WITH PHP/MYSQL • FULLTEXT index • Only for CHAR, VARCHAR & TEXT columns • For MyISAM & InnoDB tables • Configurable Stop Words • Types: • Natural Language • Natural Language with Query Expansion • Boolean Full Text
  • 11. MYSQL FULLTEXT BOOLEAN MODE Operators: •+ AND •- NOT • OR implied •() Nesting •* Wildcard •“ Literal matches
  • 12. TEXT SEARCH WITH PHP/MYSQL (CONT‟D) • Typical columns for search table: • Type • Id • Text • Priority • Process: • Blog posts, comments, … • Save (filtered) duplicate of text in search table. • When searching … • Search table and translate to original data via type/id This is how most php/mysql sites implement their search, right?
  • 13. SELECT * FROM mysearchtable WHERE MATCH(text) AGAINST („shizzle‟)
  • 14. SELECT * FROM mysearchtable WHERE MATCH(text) AGAINST („+shizzle –”ma nizzle”‟ IN BOOLEAN MODE)
  • 15. SELECT * FROM jobs WHERE role = „DEVELOPER‟ AND MATCH(job_description) AGAINST („node.js‟)
  • 16. SELECT * FROM jobs j JOIN jobs_benefits jb ON j.id = jb.job_id WHERE j.role = „DEVELOPER‟ AND (MATCH(job_description) AGAINST („node.js -asp‟ IN BOOLEAN MODE) AND jb.free_espresso = TRUE
  • 17.
  • 18. WHAT IS A SEARCH ENGINE? • Efficient indexing of data • On all fields / combination of fields • Analyzing data • Text Search • Tokenizing • Stemming • Filtering • Understanding locations • Date parsing • Relevance scoring
  • 19. TOKENIZING • Finding word boundaries • Not just explode(„ „, $text); • Chinese has no spaces. (Not every single character is a word.) • Understand patterns: • URLs • Emails • #hashtags • Twitter @mentions • Currencies (EUR, €, …)
  • 20. STEMMING • “Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form.” • Conjugations • Plurals • Example: • Fishing, Fished, Fish, Fisher > Fish • Better > Good • Several ways to find the stem: • Lookup tables • Suffix-stripping • Lemmatization •… • Different stemmers for every language.
  • 21. FILTERING • Remove stop words • Different for every language • HTML • If you‟re indexing web content, not every character is meaningful.
  • 22. UNDERSTANDING LOCATIONS • Reverse geocoding of locations to longitude & latitude • Search on location: • Bounding box searches • Distance searches • Searching nearby • Geo Polygons • Searching a country (Note: MySQL also has geospatial indeces.)
  • 23. RELEVANCE SCORING • From the matched documents, which ones do you show first? • Several strategies: • How many matches in document? • How many matches in document as percentage of length? • Custom scoring algorithms • At index time • At search time • … A combination Think of Google PageRank.
  • 24. “There‟s an app software for that.”
  • 25.
  • 26. APACHE LUCENE • “Information retrieval software library” • Free/open source • Supported by Apache Foundation • Created by Doug Cutting • Written in 1999
  • 27. “There‟s software a Java library for that.”
  • 28.
  • 29.
  • 30. ELASTICSEARCH • “You know, for Search” • Also Free & Open Source • Built on top of Lucene • Created by Shay Banon @kimchy • Versions • First public release, v0.4 in February 2010 • A rewrite of earlier “Compass” project, now with scalability built-in from the very core • Now stable version at 0.20.6 • Beta branch at 0.90 (working towards 1.0 release) • In Java, so inherently cross-platform
  • 31. WHAT DOES IT ADD TO LUCENE? • RESTfull Service • JSON API over HTTP • Want to use it from PHP? • CURL Requests, as if you‟d do requests to the Facebook Graph API. • High Availability & Performance • Clustering • Long Term Persistency • Write through to persistent storage system.
  • 32. $ cd ~/Downloads $ wget https://github.com/…/elasticsearch-0.20.5.tar.gz $ tar –xzf elasticsearch-0.20.5.tar.gz $ cd elasticsearch-0.20.5/ $ ./bin/elasticsearch
  • 33. $ cd ~/Downloads $ wget https://github.com/…/elasticsearch-0.20.5.tar.gz $ tar –xzf elasticsearch-0.20.5.tar.gz $ git clone https://github.com/elasticsearch/elasticsearch- servicewrapper.git elasticsearch-servicewrapper $ sudo mv elasticsearch-0.20.5 /usr/local/share $ cd elasticsearch-servicewrapper $ sudo mv service /usr/local/share/elasticsearch-0.20.5/bin $ cd /usr/local/share $ sudo ln -s elasticsearch-0.20.5 elasticsearch $ sudo chown -R root:wheel elasticsearch $ cd /usr/local/share/elasticsearch $ sudo bin/service/elasticsearch start
  • 34. $ sudo bin/service/elasticsearch start Starting ElasticSearch... Waiting for ElasticSearch... . . . running: PID:83071 $
  • 35.
  • 36. $ curl -XPUT http://localhost:9200/test/stupid-hypes/planking -d '{"name":"Planking", "stupidity_level":"5"}' {"ok":true,"_index":"test","_type":"stupid- hypes","_id":"planking","_version":1}
  • 37. $ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon- smashing -d '{"name":"Gallon Smashing", "stupidity_level":"5"}' {"ok":true,"_index":"test","_type":"stupid- hypes","_id":"gallon-smashing","_version":1}
  • 38. $ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon- smashing -d '{"name":"Gallon Smashing", "stupidity_level":"10"}' {"ok":true,"_index":"test","_type":"stupid- hypes","_id":"gallon-smashing","_version":2}
  • 39.
  • 40. $ curl -XPUT http://localhost:9200/test/stupid-hypes/gallon- smashing -d '{"name":"Gallon Smashing", "stupidity_level":"10", "lifetime":30}’ {"ok":true,"_index":"test","_type":"stupid- hypes","_id":"gallon-smashing","_version":3}
  • 41.
  • 42. SCHEMALESS, DOCUMENT ORIENTED • No need to configure schema upfront • No need for slow ALTER TABLE –like operations • You can define a mapping (schema) to customize the indexing process • Require fields to be of certain type • If you want text fields that should not be analyzed (stemming, …)
  • 43. “Ok, so it‟s a NoSQL store?”
  • 44.
  • 45.
  • 46. TERMINOLOGY MySQL Elastic Search Database Index Table Type Row Document Column Field Schema Mapping Index Everything is indexed SQL Query DSL SELECT * FROM table … GET http://… UPDATE table SET … PUT http://…
  • 47. DISTRIBUTED & HIGHLY AVAILABLE • Multiple servers (nodes) running in a cluster • Acting as single service • Nodes in cluster that store data or nodes that just help in speeding up search queries. • Sharding • Indeces are sharded (# shards is configurable) • Each shard can have zero or more replicas • Replicas on different servers (server pools) for failover • One in the cluster goes down? No problem. • Master • Automatic Master detection + failover • Responsible for distribution/balancing of shards
  • 48.
  • 49. SCALING ISSUES? • No need for an external load balancer • Since cluster does it‟s own routing. • Ask any server in the cluster, it will delegate to correct node. • What if … • More data > More shards. • More availability > More replicas per shard.
  • 50. PERFORMANCE TWEAKING • Bulk Indexing • Multi-Get • Avoids network latency (HTTP Api) • Api with administrative & monitoring interface • Cluster‟s availability state • Health • Nodes‟ memory footprint • Alternatives voor HTTP Api? • Java library • PHP wrappers (Sherlock, Elastica, …) • But simplicity of HTTP Api is brilliant to work with, latency is hardly an issue.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57. Query DSL Example: (language:nl OR location.country:be OR location.country:aa) (tag:sentiment.negative) author.followers:[1000 TO *] (- sub_category:like) ((-status:857.assigned) (-status:857.done))
  • 58.
  • 59.
  • 60.
  • 61. FACETS • Instead of returning the matching documents … • … return data about the distribution of values in the set of matching documents • Or a subset of the matching documents • Possibilities: • Totals per unique value • Averages of values • Distributions of values •…
  • 62. TERMINOLOGY (CONT‟D) MySQL Elastic Search SELECT field, COUNT(*) Facet FROM table GROUP BY field
  • 63.
  • 64.
  • 65. ADVANCED FEATURES • Nested documents (Child-Parent) • Like MySQL joins? • Percolation Index • Store queries in Elastic • Send it documents • Get returned which queries match • Index Warming • Register search queries that cause heavy load • New data added to index will be warmed • So next time query is executed: pre cached
  • 66. WHAT ARE MY OTHER OPTIONS? • RDBMS • MySQL, … • NoSQL • MongoDB, … • Search Engines • Solr • Sphinx • Xapian • Lucene itself • SaaS • Amazon CloudSearch
  • 67. … VS. SOLR •+ • Also built on Lucene • So similar feature set • Also exposes Lucene functionality, like Elastic Search, so easy to extend. • A part of Apache Lucene project • Perfect for Single Server search •- • Clustering is there. But it‟s definitely not as simple as ElasticSearch‟ • Fragmented code base. (Lots of branches.) Engagor used to run on Solr.
  • 68. … VS. SPHINX •+ • Great for single server full text searches; • Has graceful integration with SQL database; • (Eg. for indexing data) • Faster than the others for simple searches; •- • No out of the box clustering; • Not built on Lucene; lacks some advanced features; Netlog & Twoo use Sphinx.
  • 69. WANT TO USE IT? • In an existing project: • As an extra layer next to your data … • Send to both your database & elasticsearch; • Consistency problems?; • Or as replacement for database • Elastic is as persistent as MySQL; • If you don‟t need RDBMS features; • @Engagor: Our social messages are only in Elastic
  • 70. “Users are incredibly bad at finding and researching things on the web.” Nielsen (March 2013) http://www.nngroup.com/articles/search-navigation/
  • 71. “Pathetic and useless are words that come to mind after this year‟s user testing.” Nielsen (March 2013) http://www.nngroup.com/articles/search-navigation/
  • 72. “I‟m searching for apples and pears.”
  • 74.
  • 76. “It‟s too young. Is it even stable enough?” Your boss (Tomorrow Morning)
  • 77.
  • 78.
  • 79.
  • 80.
  • 82. elasticsearch node.js socket.io real time notifications rabbitmq backbone.js gearman jobs@engagor.com
  • 83.
  • 84. $ cd /usr/local/share/elasticsearch $ sudo bin/service/elasticsearch stop
  • 85.
  • 86. Sources include: • http://www.elasticsearch.org/videos/2010/02/07/es-introduction.html • http://www.elasticsearchtutorial.com/ • http://www.slideshare.net/clintongormley/cool-bonsai-cool-an-introduction-to-elasticsearch • http://www.slideshare.net/medcl/elastic-search-quick-intro • http://www.slideshare.net/macrochen/elastic-search-apachesolr-10881377 • http://www.slideshare.net/cyber_jso/elastic-search-introduction • http://www.slideshare.net/infochimps/elasticsearch-v4 • http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-elastic-search-and-on-a-related-note-slashem-also- works-with-elastic-search/ • http://stackoverflow.com/questions/10213009/solr-vs-elasticsearch • http://stackoverflow.com/questions/11115523/how-does-amazon-cloudsearch-compares-to-elasticsearch-solr-or-sphinx- in-terms-o • http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/

Hinweis der Redaktion

  1. This talk is about adding search to your own website.Implementing a search engine for your own content.First the bit about how it’s done in MySQL, and what problems that brings with it;Then about what a search engine should do for you;Then about how Elastic Search helps;And code of course, lots of code.
  2. I’m leading the development team at Engagor, a startup in the city centreof Gent, Belgium.We’re 2 years old.We build a social media monitoring & management product.Before that I worked for Massive Media for 5 years.As a lead developer I worked on the Netlog & Gatchaproducts.
  3. Search can be a “simple” textsearch.Here I’m searching Tumblr for funny gifs, because that’s what Tumblris for.
  4. But search can go deeper and more into detail too ..Here I’m usingAND, OR, NOTNestingRestrictionson fields
  5. Or very difficult …Searching in a mixed set of dataProfilesPhotosFriend connectionsSearching in a graph …
  6. My first thought when I’d have to add search to a php/mysql site … It sort of works …
  7. Problems arisewhen you have lots of data …To speed things up you add indecesto your MySQL tables.
  8. And the library analogy for a MySQL index is this …An index card box.
  9. MySQL has an index type esp. for full text search.Natural Language:case insensitive, accent incensitiveQuery Expansion:Search for “database” > returns results that has often has words like “mysql”, “oracle”, … A second search with extended query string happens to find related documents too.Boolean ModeAdds operators to Natural Language type
  10. Anyone using a similar system to this?Implemented yourself, or from a CMS?phpBB has/had a table like this.
  11. Example of FULLTEXT MySQL search query.
  12. Example of FULLTEXT MySQL search query in boolean mode. A bit more powerful.
  13. Now, we add restrictions on a certain fields. Now you need combined indeces to keep this fast.
  14. And even more restrictions.Indeces on all involved feeds? In all combinations? In all orders?Lots of indeces make WRITE operations on your tables slower.
  15. MySQL just isn’t built for complex search …So let’s look at what a system built for search needs …
  16. So it’s old.But still active.Used a lot.
  17. It is however a java library. It’s not a fully managed service.
  18. If you have lots of data; you need a search engine to be scalable & highly available.
  19. And that’s where ElasticSearch comes in.
  20. Download, unzip, start.
  21. This time we install it in the right place, and wrapped in a service.
  22. Now it runs on your localhost. As a HTTP service, so open your browser and surf to your Elastic Search server.
  23. HTTP Access, it’s brilliant!Do you want to secure it? Add firewalls …Do you want to cache it? Add Varnish …
  24. Example of adding something to ElasticSearch from your command shell via a HTTP PUT request done by curl.You can do this right after installation. No need to create an index or configure anything, just add data right away.
  25. Adding a second record (document).
  26. Updating an existing record. (Mind the new version number.)
  27. Want to see the record?Surf to it.The url consist ofindex, type & id.
  28. Want to add a new field (column)?Update your document with new field added.
  29. It’s right there.
  30. So it’s schemaless.And actually we did ZERO CONFIGURATION. We didn’t have to create indeces or tell Elastic Search what type of data we’ll be adding.Actually, you can configure a mapping/schema:To require certain fields to be of a certain typeTo avoid text fields of being analyzed (text analysis)Basically: to speed things up …
  31. What we’ve demoed so far is aNoSQL store. That’s cool. But not all.
  32. Here we do a GET request (in the browser) that searches our newly created index for the word “smashing”It returns the 2nd document.
  33. Curl in PHP is simple.Simplest example of how to do a search to elastic from PHP.
  34. I’ve mentioned a few Elastic Search specific terms.Here’s the full breakdown of terminology and how they related to MySQL concepts.
  35. Back to the clustering features Elastic Search adds …
  36. Here you have a Engagor specific dashboard that shows the 12 servers in the Engagor cluster.You see:server12 is the master;That every node knows each other;There are 11K shards;Each with one replica.Health = green means every shard has a replica (on a different server).If one server goes down: no problem.
  37. Now let’s look at how Engagor uses Elastic Search.First, what do we do?From a technical point of view:Engagor = Huge database of social messages.Facebook, Twitter, Instagram, News sites …We save those that are clients are interested in to our application and offer:Statistics about the tracked dataWorkflow toolsAutomation tools
  38. This is the time to show a slide I stole from a presentation from our CEO Folke.We started 2 years ago.I joined after 6 months.Now a team of 16 people.7 of them technical profilesweb developersdata scientistsbackend developersfrontend developersOur customers includeMobistarTelenetMicrosoft EuropeEuropean CommisionAlproSeveral agenciesThey use Engagor for customer support (“call center” software for social media)marketing insightscrisis detection…
  39. Here you see an example of a search on our cluster for a certain twitter user’s handle.You see it returns 260 social messages.Each message has data like the:IdService it’s fromContentDateAuthor details
  40. Here we’re searching in a topic about Belgacom & Mobile Vikings.“All messages from users with at least 1000 followers, that are negative and from Belgium or in Dutch.”On 4 different fields, and nested … It just works.
  41. This is the Query SQL we’re sending to Elastic for the previous search.
  42. Here we are in a topic about Coca Cola, thus high volume.About 50k message per day. 28 days.That’s 1,4M messages we’re searching in.This is a graph of messages per day.
  43. This is the inbox. Showing the last 10 messages in that topic.Performance: about half a second.
  44. Same inbox, but now only showing messages with the word “thirsty”.Performance: again about half a second.(Only 1 sample, so this is not really a benchmark .)
  45. Now, there’s another feature of search engines you might not immediately think about.
  46. Think of it as an equivalent to MySQL GROUP BY query.
  47. The pie chart:Facet on sentiment field of a mention. Returns totals per value.Sentiment Per Day:Facet on combined fields: sentiment + day(dateadd).Returns data used for second chart.You can also use the filter like in the inbox, to see these facets for a filtered set of data. Eg. Sentiment per day for mentions coming from Belgium only.
  48. For the Telenet Twitter profile:Totals of messages per dayPer typeRetweetsMentionsOwn tweetsEvery “segment” (color) is facet with custom filter/search querySingle ElasticSearch call to get all this information.
  49. Percolation example:Use it to route documents …Eg. you have a stream of data coming in and need to decide what to do with those documents based on queries.Everything matching x is for client AEverything matching y is for client B
  50. There’s a good competition going on. Looks like Elastic Search has made Solr alert again; since they’re also focusing on clustering features now.
  51. It’s not because you have a great search engine, that you have great search experience on your site …Users aren’t very good at it …But searching for things is not that easy …
  52. Look at how natural language differs from search query language.
  53. This happens quite often at the Engagor office.Not that we blame our users …It’s just difficult to get it right.
  54. If you want to play with it, and are excited …But you’re boss is all #meh.
  55. ElasticSearch is a mature product.Soundcloud is using it.
  56. StumbleUpon is using it.
  57. Foursquare switched to it …When checking in somewhere, they show:Locations close to youLocations you previously checked inLocations that are popularThat’s a pretty nifty search query right there.
  58. There’s also a real company behind Elastic Search, backing it with:TutorialsTrainingSupportSLA’sConsulting
  59. Recently redesigned site. Documentation is a bit frightening at first, since it’s hard to know what to search for, but I hope this presentation solves that.IRC channel is very active; quick answers.
  60. If you’re interested in working with Elastic Search, or in fact, any of these other cool technologies …We can always use good profiles: junior or senior, backend or frontend … Send us your resumes ;).March 27th: Now we’re esp. searching for someone who’s good at Javascript.
  61. Already gave an overview of how it’s like to work with the technologies at Engagor.Go to http://startupshizzle.tumblr.com to find out how it’s like to work with the people of our team ;).
  62. So, that’s it.
  63. So, that’s it.