SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Tagging and Folksonomy
  Schema Design for Scalability and Performance
                                     Jay Pipes
                     Community Relations Manager, North America
                                 (jay@mysql.com)
                                    MySQL, Inc.
   TELECONFERENCE: please dial a number to hear the audio portion of this presentation.

   Toll-free US/Canada: 866-469-3239        Direct US/Canada: 650-429-3300 :
   0800-295-240 Austria                     0800-71083 Belgium
   80-884912 Denmark                        0-800-1-12585 Finland
   0800-90-5571 France                      0800-101-6943 Germany
   00800-12-6759 Greece                     1-800-882019 Ireland
   800-780-632 Italy                        0-800-9214652 Israel
   800-2498 Luxembourg                      0800-022-6826 Netherlands
   800-15888 Norway                         900-97-1417 Spain
   020-79-7251 Sweden                       0800-561-201 Switzerland
   0800-028-8023 UK                         1800-093-897 Australia
   800-90-3575 Hong Kong                    00531-12-1688 Japan

                                   EVENT NUMBER/ACCESS CODE:

Copyright MySQL AB                                            The World’s Most Popular Open Source Database   1
Communicate ...

Connect ...

Share ...

Play ...

Trade ...                                            craigslist


Search & Look Up



Copyright MySQL AB   The World’s Most Popular Open Source Database   2
Software Industry Evolution
                                   WEB 1.0                     WEB 2.0




                                                                FREE & OPEN
                                                                SOURCE
                                                                SOFTWARE


                                    CLOSED SOURCE SOFTWARE



                           1990          2000
                                  1995             2005

Copyright MySQL AB                              The World’s Most Popular Open Source Database   3
How MySQL Enables Web 2.0

                                   ubiquitous with developers
             lower tco
                                                           large objects
                                  tag databases
                        lamp                      XML
  full-text search                                                language support
                        ease of use
     world-wide
                                                  performance
     community
                                                                                       scale out
                           interoperable
                                                  connectors
      open source                                                         bundled

                                reliability          session management
             replication
                                                                      query cache
   partitioning (5.1)
                      pluggable storage engines




Copyright MySQL AB                                      The World’s Most Popular Open Source Database   4
Agenda
     •   Web 2.0: more than just rounded corners
     •   Tagging Concepts in SQL
     •   Folksonomy Concepts in SQL
     •   Scaling out sensibly




Copyright MySQL AB                        The World’s Most Popular Open Source Database   5
More than just rounded corners
   • What is Web 2.0?
          – Participation
          – Interaction
          – Connection
   • A changing of design patterns
          –   AJAX and XML-RPC are changing the way data is queried
          –   Rich web-based clients
          –   Everyone's got an API
          –   API leads to increased requests per second. Must deal with
              growth!




Copyright MySQL AB                                  The World’s Most Popular Open Source Database   6
AJAX and XML-RPC change the game
   • Traditional (Web 1.0) page request builds entire page
          – Lots of HTML, style and data in each page request
          – Lots of data processed or queried in each request
          – Complex page requests and application logic
   • AJAX/XML-RPC request returns small data set, or
     page fragment
          – Smaller amount of data being passed on each request
          – But, often many more requests compared to Web 1.0!
          – Exposed APIs mean data services distribute your data to
            syndicated sites
          – RSS feeds supply data, not full web page




Copyright MySQL AB                                The World’s Most Popular Open Source Database   7
flickr Tag Cloud(s)




Copyright MySQL AB                  The World’s Most Popular Open Source Database   8
Participation, Interaction and Connection
   • Multiple users editing the same content
          – Content explosion
          – Lots of textual data to manage. How do we organize it?
   • User-driven data
          – “Tracked” pages/items
          – Allows interlinking of content via the user to user relationship
            (folksonomy)
   • Tags becoming the new categorization/filing of this
     content
          – Anyone can tag the data
          – Tags connect one thing to another, similar to the way that the
            folksonomy relationships link users together
   • So, the common concept here is “linking”...

Copyright MySQL AB                                    The World’s Most Popular Open Source Database   9
What The User Dimension Gives You
   • Folksonomy adds the “user dimension”
   • Tags often provide the “glue” between user and item
     dimensions
   • Answers questions such as:
          – Who shares my interests
          – What are the other interests of users who share my interests
          – Someone tagged my bookmark (or any other item) with
            something. What other items did that person tag with the
            same thing?
          – What is the user's immediate “ecosystem”? What are the
            fringes of that ecosystem?
                • Ecosystem radius expansion (like zipcode radius expansion...)
   • Marks a new aspect of web applications into the world
     of data warehousing and analysis
Copyright MySQL AB                                       The World’s Most Popular Open Source Database   10
Linking Is the “R” in RDBMS
   • The schema becomes the absolute driving force behind
     Web 2.0 applications
   • Understanding of many-to-many relationships is critical
   • Take advantage of MySQL's architecture so that our
     schema is as efficient as possible
   • Ensure your data store is normalized, standardized,
     and consistent
   • So, let's digg in! (pun intended)




Copyright MySQL AB                      The World’s Most Popular Open Source Database   11
Comparison of Tag Schema Designs
   • “MySQLicious”
          –   Entirely denormalized
          –   One main fact table with one field storing delimited list of tags
          –   Unless using FULLTEXT indexing, does not scale well at all
          –   Very inflexible
   • “Scuttle” solution
          – Two tables. Lookup one-way from item table to tag table
          – Somewhat inflexible
          – Doesn't represent a many-to-many relationship correctly
   • “Toxi” solution
          – Almost gets it right, except uses surrogate keys in mapping
            tables ...
          – Flexible, normalized approach
          – Closest to our recommended architecture ...
Copyright MySQL AB                                     The World’s Most Popular Open Source Database   12
Tagging Concepts in SQL




Copyright MySQL AB                    The World’s Most Popular Open Source Database   13
The Tags Table
   • The “Tags” table is the foundational table upon which
     all tag links are built
   • Lean and mean
   • Make primary key an INT!

    CREATE TABLE Tags (
    tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    , tag_text VARCHAR(50) NOT NULL
    , PRIMARY KEY pk_Tags (tag_id)
    , UNIQUE INDEX uix_TagText (tag_text)
    ) ENGINE=InnoDB;



Copyright MySQL AB                       The World’s Most Popular Open Source Database   14
Example Tag2Post Mapping Table
   • The mapping table creates the link between a tag and
     anything else
   • In other terms, it maps a many-to-many relationship
   • Important to index from both “sides”

    CREATE TABLE Tag2Post (
    tag_id INT UNSIGNED NOT NULL
    , post_id INT UNSIGNED NOT NULL
    , PRIMARY KEY pk_Tag2Post (tag_id, post_id)
    , INDEX (post_id)
    ) ENGINE=InnoDB;



Copyright MySQL AB                     The World’s Most Popular Open Source Database   15
The Tag Cloud
   • Tag density typically represented by larger fonts or
     different colors

      SELECT tag_text
      , COUNT(*) as num_tags
      FROM Tag2Post t2p
      INNER JOIN Tags t
      ON t2p.tag_id = t.tag_id
      GROUP BY tag_text;




Copyright MySQL AB                        The World’s Most Popular Open Source Database   16
Efficiency Issues with the Tag Cloud
   • With InnoDB tables, you don't want to be issuing
     COUNT() queries, even on an indexed field
       CREATE TABLE TagStat
       tag_id INT UNSIGNED NOT NULL
       , num_posts INT UNSIGNED NOT NULL
       // ... num_xxx INT UNSIGNED NOT NULL 
       , PRIMARY KEY (tag_id)
       ) ENGINE=InnoDB;

        SELECT tag_text, ts.num_posts
        FROM Tag2Post t2p
        INNER JOIN Tags t
        ON t2p.tag_id = t.tag_id
        INNER JOIN TagStat ts
        ON t.tag_id = ts.tag_id
        GROUP BY tag_text;

Copyright MySQL AB                      The World’s Most Popular Open Source Database   17
The Typical Related Items Query
   • Get all posts tagged with any tag attached to Post #6
   • In other words “Get me all posts related to post #6”


       SELECT p2.post_id 
       FROM Tag2Post p1
       INNER JOIN Tag2Post p2
       ON p1.tag_id = p2.tag_id
       WHERE p1.post_id = 6
       GROUP BY p2.post_id;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   18
Problems With Related Items Query
   • Joining small to medium sized “tag sets” works great
   • But... when you've got a large tag set on either “side” of
     the join, problems can occur with scalablity
   • One way to solve is via derived tables

       SELECT p2.post_id 
       FROM (
       SELECT tag_id FROM Tag2Post
       WHERE post_id = 6 LIMIT 10
       ) AS p1
       INNER JOIN Tag2Post p2
       ON p1.tag_id = p2.tag_id
       GROUP BY p2.post_id LIMIT 10;

Copyright MySQL AB                        The World’s Most Popular Open Source Database   19
The Typical Related Tags Query
   • Get all tags related to a particular tag via an item
   • The “reverse” of the related items query; we want a set
     of related tags, not related posts

       SELECT t2p2.tag_id, t2.tag_text  
       FROM (SELECT post_id FROM Tags t1
       INNER JOIN Tag2Post 
       ON t1.tag_id = Tag2Post.tag_id
       WHERE t1.tag_text = 'beach' LIMIT 10
       ) AS t2p1
       INNER JOIN Tag2Post t2p2
       ON t2p1.post_id = t2p2.post_id
       INNER JOIN Tags t2
       ON t2p2.tag_id = t2.tag_id
       GROUP BY t2p2.tag_id LIMIT 10;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   20
Dealing With More Than One Tag
   • What if we want only items related to each of a set of
     tags?
   • Here is the typical way of dealing with this problem

       SELECT t2p.post_id
       FROM Tags t1 
       INNER JOIN Tag2Post t2p
       ON t1.tag_id = t2p.tag_id
       WHERE t1.tag_text IN ('beach','cloud')
       GROUP BY t2p.post_id 
       HAVING COUNT(DISTINCT t2p.tag_id) = 2;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   21
Dealing With More Than One Tag (cont'd)
   • The GROUP BY and the HAVING COUNT(DISTINCT
     ...) can be eliminated through joins
   • Thus, you eliminate the Using temporary, using filesort
     in the query execution

        SELECT t2p2.post_id
        FROM Tags t1 CROSS JOIN Tags t2
        INNER JOIN Tag2Post t2p1
        ON t1.tag_id = t2p1.tag_id
        INNER JOIN Tag2Post t2p2
        ON t2p1.post_id = t2p2.post_id
        AND t2p2.tag_id = t2.tag_id
        WHERE t1.tag_text = 'beach'
        AND t2.tag_text = 'cloud';


Copyright MySQL AB                        The World’s Most Popular Open Source Database   22
Excluding Tags From A Resultset
   • Here, we want have a search query like “Give me all
     posts tagged with “beach” and “cloud” but not tagged
     with “flower”. Typical solution you will see:

        SELECT t2p1.post_id
        FROM Tag2Post t2p1
        INNER JOIN Tags t1 ON t2p1.tag_id = t1.tag_id
        WHERE t1.tag_text IN ('beach','cloud')
        AND t2p1.post_id NOT IN
        (SELECT post_id FROM Tag2Post t2p2
        INNER JOIN Tags t2 
        ON t2p2.tag_id = t2.tag_id
        WHERE t2.tag_text = 'flower')
        GROUP BY t2p1.post_id 
        HAVING COUNT(DISTINCT t2p1.tag_id) = 2;


Copyright MySQL AB                      The World’s Most Popular Open Source Database   23
Excluding Tags From A Resultset (cont'd)
   • More efficient to use an outer join to filter out the
     “minus” operator, plus get rid of the GROUP BY...
       SELECT t2p2.post_id
       FROM Tags t1 CROSS JOIN Tags t2 CROSS JOIN Tags t3
       INNER JOIN Tag2Post t2p1
       ON t1.tag_id = t2p1.tag_id
       INNER JOIN Tag2Post t2p2
       ON t2p1.post_id = t2p2.post_id
       AND t2p2.tag_id = t2.tag_id
       LEFT JOIN Tag2Post t2p3
       ON t2p2.post_id = t2p3.post_id
       AND t2p3.tag_id = t3.tag_id
       WHERE t1.tag_text = 'beach'
       AND t2.tag_text = 'cloud'
       AND t3.tag_text = 'flower'
       AND t2p3.post_id IS NULL;

Copyright MySQL AB                         The World’s Most Popular Open Source Database   24
Summary of SQL Tips for Tagging
   • Index fields in mapping tables properly. Ensure that
     each GROUP BY can access an index from the left
     side of the index
   • Use summary or statistic tables to eliminate the use of
     COUNT(*) expressions in tag clouding
   • Get rid of GROUP BY and HAVING COUNT(...) by
     using standard join techniques
   • Get rid of NOT IN expressions via a standard outer join
   • Use derived tables with an internal LIMIT expression to
     prevent wild relation queries from breaking scalability




Copyright MySQL AB                       The World’s Most Popular Open Source Database   25
Folksonomy Concepts in SQL




Copyright MySQL AB                     The World’s Most Popular Open Source Database   26
Here we see
                        tagging and
                        folksonomy
                     together: the user
                        dimension...




Copyright MySQL AB                        The World’s Most Popular Open Source Database   27
Folksonomy Adds The User Dimension
   • Adding the user dimension to our schema
   • The tag is the relationship glue between the user and
     item dimensions

       CREATE TABLE UserTagPost (
       user_id INT UNSIGNED NOT NULL
       , tag_id INT UNSIGNED NOT NULL
       , post_id INT UNSIGNED NOT NULL
       , PRIMARY KEY (user_id, tag_id, post_id)
       , INDEX (tag_id)
       , INDEX (post_id)
       ) ENGINE=InnoDB;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   28
Who Shares My Interest Directly?
   • Find out the users who have linked to the same item I
     have
   • Direct link; we don't go through the tag glue


    SELECT user_id 
    FROM UserTagPost
    WHERE post_id = @my_post_id;




Copyright MySQL AB                      The World’s Most Popular Open Source Database   29
Who Shares My Interests Indirectly?
   • Find out the users who have similar tag sets...
   • But, how much matching do we want to do? In other
     words, what radius do we want to match on...
   • The first step is to find my tags that are within the
     search radius... this yields my “top” or most popular
     tags

           SELECT tag_id 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   30
Who Shares My Interests (cont'd)
   • Now that we have our “top” tag set, we want to find
     users who match all of our top tags:


           SELECT others.user_id 
           FROM UserTagPost others 
           INNER JOIN (SELECT tag_id 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius) AS my_tags
           ON others.tag_id = my_tags.tag_id
           GROUP BY others.user_id;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   31
Ranking Ecosystem Matches
   • What about finding our “closest” ecosystem matches
   • We can “rank” other users based on whether they have
     tagged items a number of times similar to ourselves
           SELECT others.user_id
           , (COUNT(*) ­ my_tags.num_tags) AS rank
           FROM UserTagPost others 
           INNER JOIN (
           SELECT tag_id, COUNT(*) AS num_tags 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius
           ) AS my_tags
           ON others.tag_id = my_tags.tag_id
           GROUP BY others.user_id
           ORDER BY rank DESC;

Copyright MySQL AB                       The World’s Most Popular Open Source Database   32
Ranking Ecosystem Matches Efficiently
   • But... we've still got our COUNT(*) problem
   • How about another summary table ...

                 CREATE TABLE UserTagStat (
                 user_id INT UNSIGNED NOT NULL
                 , tag_id INT UNSIGNED NOT NUL
                 , num_posts INT UNSIGNED NOT NULL
                 , PRIMARY KEY (user_id, tag_id)
                 , INDEX (tag_id)
                 ) ENGINE=InnoDB;




Copyright MySQL AB                          The World’s Most Popular Open Source Database   33
Ranking Ecosystem Matches Efficiently 2
   • Hey, we've eliminated the aggregation!


        SELECT others.user_id
        , (others.num_posts ­ my_tags.num_posts) AS rank
        FROM UserTagStat others 
        INNER JOIN (
        SELECT tag_id, num_posts
        FROM UserTagStat
        WHERE user_id = @my_user_id
        AND  num_posts >= @radius
        ) AS my_tags
        ON others.tag_id = my_tags.tag_id
        ORDER BY rank DESC;




Copyright MySQL AB                      The World’s Most Popular Open Source Database   34
Scaling Out Sensibly




Copyright MySQL AB                   The World’s Most Popular Open Source Database   35
MySQL Replication (Scale Out)
                                                             • Write to one master
                      Web/App       Web/App     Web/App
                                                             • Read from many slaves
                       Server        Server      Server
                                                             • Excellent for read
                                                               intensive apps
                            Load Balancer


                                        Reads             Reads
     Writes & Reads




                                                                         …
             Master                       Slave            Slave
             MySQL                        MySQL            MySQL
             Server                       Server           Server



                      Replication




Copyright MySQL AB                                         The World’s Most Popular Open Source Database   36
Scale Out Using Replication
     • Master DB stores all writes
     • Master has InnoDB tables
     • Slaves handle aggregate reads, non-realtime reads
     • Web servers can be load balanced (directed) to one or
       more slaves
     • Just plug in another slave to increase read performance
       (that's scaling out...)
     • Slave can provide hot standby as well as backup server




Copyright MySQL AB                       The World’s Most Popular Open Source Database   37
Scale Out Strategies
                     • Slave storage engine can differ from
                       Master!
                        – InnoDB on Master (great update/insert/delete
                          performance)
                        – MyISAM on Slave (fantastic read performance
                          and well as excellent concurrent insert
                          performance, plus can use FULLTEXT
                          indexing)
                     • Push aggregated summary data in
                       batches onto slaves for excellent read
                       performance of semi-static data
                        – Example: “this week's popular tags”
                           • Generate the data via cron job on each slave.
                             No need to burden the master server
                           • Truncate every week

Copyright MySQL AB                               The World’s Most Popular Open Source Database   38
Scale Out Strategies (cont'd)
                        • Offload FULLTEXT indexing onto a FT
                          indexer such as Apache Lucene,
                          Mnogosearch, Sphinx FT Engine, etc.
                        • Use Partitioning feature of 5.1 to
                          segment tag data across multiple
                          partitions, allowing you to spread disk
                          load sensibly based on your tag text
                          density
                        • Use the MySQL Query Cache effectively
                           – Use SQL_NO_CACHE when selecting from
                             frequently updated tables (ex: TagStat)
                           – Very effective for high-read environments:
                             can yield 200-250% performance
                             improvement


Copyright MySQL AB                                The World’s Most Popular Open Source Database   39
Questions?
 MySQL Forge
 http://forge.mysql.com/

 MySQL Forge Tag Schema Wiki pages
 http://forge.mysql.com/wiki/TagSchema

 PlanetMySQL
 http://www.planetmysql.org

                                   Jay Pipes
                               (jay@mysql.com)
                                  MySQL AB




Copyright MySQL AB                               The World’s Most Popular Open Source Database   40

Weitere ähnliche Inhalte

Andere mochten auch

Tagging and Folksonomies
Tagging and FolksonomiesTagging and Folksonomies
Tagging and Folksonomieshhunt75
 
Tagging & Folksonomies poster
Tagging & Folksonomies posterTagging & Folksonomies poster
Tagging & Folksonomies posterPrattSILS
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCThomas Vander Wal
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentationamit_nathwani
 
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Freddy Limpens
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy PresentationVanessa Yau
 
Next Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingNext Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingThomas Vander Wal
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Introduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllIntroduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllMarc Grabanski
 
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesSocial Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesFreddy Limpens
 

Andere mochten auch (11)

Tagging and Folksonomies
Tagging and FolksonomiesTagging and Folksonomies
Tagging and Folksonomies
 
Tagging & Folksonomies poster
Tagging & Folksonomies posterTagging & Folksonomies poster
Tagging & Folksonomies poster
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DC
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentation
 
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
 
Folksonomy
FolksonomyFolksonomy
Folksonomy
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentation
 
Next Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingNext Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and Tagging
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Introduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllIntroduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for All
 
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesSocial Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
 

Ähnlich wie Tagging and Folksonomy Schema Design for Scalability and Performance

Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
Oracle no sql database bigdata
Oracle no sql database   bigdataOracle no sql database   bigdata
Oracle no sql database bigdataJoão Gabriel Lima
 
MySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeMySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeArnab Ray
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020Alkin Tezuysal
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlsqlhjalp
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
MySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityMySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityFrederic Descamps
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02Vinamra Mittal
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementClusterpoint
 

Ähnlich wie Tagging and Folksonomy Schema Design for Scalability and Performance (20)

Mysql
MysqlMysql
Mysql
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
MySQL Cluster
MySQL ClusterMySQL Cluster
MySQL Cluster
 
Brandon
BrandonBrandon
Brandon
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Oracle no sql database bigdata
Oracle no sql database   bigdataOracle no sql database   bigdata
Oracle no sql database bigdata
 
MySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeMySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime Time
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Introduction to Mysql
Introduction to MysqlIntroduction to Mysql
Introduction to Mysql
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdl
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
MySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityMySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the Community
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Tagging and Folksonomy Schema Design for Scalability and Performance

  • 1. Tagging and Folksonomy Schema Design for Scalability and Performance Jay Pipes Community Relations Manager, North America (jay@mysql.com) MySQL, Inc. TELECONFERENCE: please dial a number to hear the audio portion of this presentation. Toll-free US/Canada: 866-469-3239 Direct US/Canada: 650-429-3300 : 0800-295-240 Austria 0800-71083 Belgium 80-884912 Denmark 0-800-1-12585 Finland 0800-90-5571 France 0800-101-6943 Germany 00800-12-6759 Greece 1-800-882019 Ireland 800-780-632 Italy 0-800-9214652 Israel 800-2498 Luxembourg 0800-022-6826 Netherlands 800-15888 Norway 900-97-1417 Spain 020-79-7251 Sweden 0800-561-201 Switzerland 0800-028-8023 UK 1800-093-897 Australia 800-90-3575 Hong Kong 00531-12-1688 Japan EVENT NUMBER/ACCESS CODE: Copyright MySQL AB The World’s Most Popular Open Source Database 1
  • 2. Communicate ... Connect ... Share ... Play ... Trade ... craigslist Search & Look Up Copyright MySQL AB The World’s Most Popular Open Source Database 2
  • 3. Software Industry Evolution WEB 1.0 WEB 2.0 FREE & OPEN SOURCE SOFTWARE CLOSED SOURCE SOFTWARE 1990 2000 1995 2005 Copyright MySQL AB The World’s Most Popular Open Source Database 3
  • 4. How MySQL Enables Web 2.0 ubiquitous with developers lower tco large objects tag databases lamp XML full-text search language support ease of use world-wide performance community scale out interoperable connectors open source bundled reliability session management replication query cache partitioning (5.1) pluggable storage engines Copyright MySQL AB The World’s Most Popular Open Source Database 4
  • 5. Agenda • Web 2.0: more than just rounded corners • Tagging Concepts in SQL • Folksonomy Concepts in SQL • Scaling out sensibly Copyright MySQL AB The World’s Most Popular Open Source Database 5
  • 6. More than just rounded corners • What is Web 2.0? – Participation – Interaction – Connection • A changing of design patterns – AJAX and XML-RPC are changing the way data is queried – Rich web-based clients – Everyone's got an API – API leads to increased requests per second. Must deal with growth! Copyright MySQL AB The World’s Most Popular Open Source Database 6
  • 7. AJAX and XML-RPC change the game • Traditional (Web 1.0) page request builds entire page – Lots of HTML, style and data in each page request – Lots of data processed or queried in each request – Complex page requests and application logic • AJAX/XML-RPC request returns small data set, or page fragment – Smaller amount of data being passed on each request – But, often many more requests compared to Web 1.0! – Exposed APIs mean data services distribute your data to syndicated sites – RSS feeds supply data, not full web page Copyright MySQL AB The World’s Most Popular Open Source Database 7
  • 8. flickr Tag Cloud(s) Copyright MySQL AB The World’s Most Popular Open Source Database 8
  • 9. Participation, Interaction and Connection • Multiple users editing the same content – Content explosion – Lots of textual data to manage. How do we organize it? • User-driven data – “Tracked” pages/items – Allows interlinking of content via the user to user relationship (folksonomy) • Tags becoming the new categorization/filing of this content – Anyone can tag the data – Tags connect one thing to another, similar to the way that the folksonomy relationships link users together • So, the common concept here is “linking”... Copyright MySQL AB The World’s Most Popular Open Source Database 9
  • 10. What The User Dimension Gives You • Folksonomy adds the “user dimension” • Tags often provide the “glue” between user and item dimensions • Answers questions such as: – Who shares my interests – What are the other interests of users who share my interests – Someone tagged my bookmark (or any other item) with something. What other items did that person tag with the same thing? – What is the user's immediate “ecosystem”? What are the fringes of that ecosystem? • Ecosystem radius expansion (like zipcode radius expansion...) • Marks a new aspect of web applications into the world of data warehousing and analysis Copyright MySQL AB The World’s Most Popular Open Source Database 10
  • 11. Linking Is the “R” in RDBMS • The schema becomes the absolute driving force behind Web 2.0 applications • Understanding of many-to-many relationships is critical • Take advantage of MySQL's architecture so that our schema is as efficient as possible • Ensure your data store is normalized, standardized, and consistent • So, let's digg in! (pun intended) Copyright MySQL AB The World’s Most Popular Open Source Database 11
  • 12. Comparison of Tag Schema Designs • “MySQLicious” – Entirely denormalized – One main fact table with one field storing delimited list of tags – Unless using FULLTEXT indexing, does not scale well at all – Very inflexible • “Scuttle” solution – Two tables. Lookup one-way from item table to tag table – Somewhat inflexible – Doesn't represent a many-to-many relationship correctly • “Toxi” solution – Almost gets it right, except uses surrogate keys in mapping tables ... – Flexible, normalized approach – Closest to our recommended architecture ... Copyright MySQL AB The World’s Most Popular Open Source Database 12
  • 13. Tagging Concepts in SQL Copyright MySQL AB The World’s Most Popular Open Source Database 13
  • 14. The Tags Table • The “Tags” table is the foundational table upon which all tag links are built • Lean and mean • Make primary key an INT! CREATE TABLE Tags ( tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT , tag_text VARCHAR(50) NOT NULL , PRIMARY KEY pk_Tags (tag_id) , UNIQUE INDEX uix_TagText (tag_text) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 14
  • 15. Example Tag2Post Mapping Table • The mapping table creates the link between a tag and anything else • In other terms, it maps a many-to-many relationship • Important to index from both “sides” CREATE TABLE Tag2Post ( tag_id INT UNSIGNED NOT NULL , post_id INT UNSIGNED NOT NULL , PRIMARY KEY pk_Tag2Post (tag_id, post_id) , INDEX (post_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 15
  • 16. The Tag Cloud • Tag density typically represented by larger fonts or different colors SELECT tag_text , COUNT(*) as num_tags FROM Tag2Post t2p INNER JOIN Tags t ON t2p.tag_id = t.tag_id GROUP BY tag_text; Copyright MySQL AB The World’s Most Popular Open Source Database 16
  • 17. Efficiency Issues with the Tag Cloud • With InnoDB tables, you don't want to be issuing COUNT() queries, even on an indexed field CREATE TABLE TagStat tag_id INT UNSIGNED NOT NULL , num_posts INT UNSIGNED NOT NULL // ... num_xxx INT UNSIGNED NOT NULL  , PRIMARY KEY (tag_id) ) ENGINE=InnoDB; SELECT tag_text, ts.num_posts FROM Tag2Post t2p INNER JOIN Tags t ON t2p.tag_id = t.tag_id INNER JOIN TagStat ts ON t.tag_id = ts.tag_id GROUP BY tag_text; Copyright MySQL AB The World’s Most Popular Open Source Database 17
  • 18. The Typical Related Items Query • Get all posts tagged with any tag attached to Post #6 • In other words “Get me all posts related to post #6” SELECT p2.post_id  FROM Tag2Post p1 INNER JOIN Tag2Post p2 ON p1.tag_id = p2.tag_id WHERE p1.post_id = 6 GROUP BY p2.post_id; Copyright MySQL AB The World’s Most Popular Open Source Database 18
  • 19. Problems With Related Items Query • Joining small to medium sized “tag sets” works great • But... when you've got a large tag set on either “side” of the join, problems can occur with scalablity • One way to solve is via derived tables SELECT p2.post_id  FROM ( SELECT tag_id FROM Tag2Post WHERE post_id = 6 LIMIT 10 ) AS p1 INNER JOIN Tag2Post p2 ON p1.tag_id = p2.tag_id GROUP BY p2.post_id LIMIT 10; Copyright MySQL AB The World’s Most Popular Open Source Database 19
  • 20. The Typical Related Tags Query • Get all tags related to a particular tag via an item • The “reverse” of the related items query; we want a set of related tags, not related posts SELECT t2p2.tag_id, t2.tag_text   FROM (SELECT post_id FROM Tags t1 INNER JOIN Tag2Post  ON t1.tag_id = Tag2Post.tag_id WHERE t1.tag_text = 'beach' LIMIT 10 ) AS t2p1 INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id INNER JOIN Tags t2 ON t2p2.tag_id = t2.tag_id GROUP BY t2p2.tag_id LIMIT 10; Copyright MySQL AB The World’s Most Popular Open Source Database 20
  • 21. Dealing With More Than One Tag • What if we want only items related to each of a set of tags? • Here is the typical way of dealing with this problem SELECT t2p.post_id FROM Tags t1  INNER JOIN Tag2Post t2p ON t1.tag_id = t2p.tag_id WHERE t1.tag_text IN ('beach','cloud') GROUP BY t2p.post_id  HAVING COUNT(DISTINCT t2p.tag_id) = 2; Copyright MySQL AB The World’s Most Popular Open Source Database 21
  • 22. Dealing With More Than One Tag (cont'd) • The GROUP BY and the HAVING COUNT(DISTINCT ...) can be eliminated through joins • Thus, you eliminate the Using temporary, using filesort in the query execution SELECT t2p2.post_id FROM Tags t1 CROSS JOIN Tags t2 INNER JOIN Tag2Post t2p1 ON t1.tag_id = t2p1.tag_id INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id AND t2p2.tag_id = t2.tag_id WHERE t1.tag_text = 'beach' AND t2.tag_text = 'cloud'; Copyright MySQL AB The World’s Most Popular Open Source Database 22
  • 23. Excluding Tags From A Resultset • Here, we want have a search query like “Give me all posts tagged with “beach” and “cloud” but not tagged with “flower”. Typical solution you will see: SELECT t2p1.post_id FROM Tag2Post t2p1 INNER JOIN Tags t1 ON t2p1.tag_id = t1.tag_id WHERE t1.tag_text IN ('beach','cloud') AND t2p1.post_id NOT IN (SELECT post_id FROM Tag2Post t2p2 INNER JOIN Tags t2  ON t2p2.tag_id = t2.tag_id WHERE t2.tag_text = 'flower') GROUP BY t2p1.post_id  HAVING COUNT(DISTINCT t2p1.tag_id) = 2; Copyright MySQL AB The World’s Most Popular Open Source Database 23
  • 24. Excluding Tags From A Resultset (cont'd) • More efficient to use an outer join to filter out the “minus” operator, plus get rid of the GROUP BY... SELECT t2p2.post_id FROM Tags t1 CROSS JOIN Tags t2 CROSS JOIN Tags t3 INNER JOIN Tag2Post t2p1 ON t1.tag_id = t2p1.tag_id INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id AND t2p2.tag_id = t2.tag_id LEFT JOIN Tag2Post t2p3 ON t2p2.post_id = t2p3.post_id AND t2p3.tag_id = t3.tag_id WHERE t1.tag_text = 'beach' AND t2.tag_text = 'cloud' AND t3.tag_text = 'flower' AND t2p3.post_id IS NULL; Copyright MySQL AB The World’s Most Popular Open Source Database 24
  • 25. Summary of SQL Tips for Tagging • Index fields in mapping tables properly. Ensure that each GROUP BY can access an index from the left side of the index • Use summary or statistic tables to eliminate the use of COUNT(*) expressions in tag clouding • Get rid of GROUP BY and HAVING COUNT(...) by using standard join techniques • Get rid of NOT IN expressions via a standard outer join • Use derived tables with an internal LIMIT expression to prevent wild relation queries from breaking scalability Copyright MySQL AB The World’s Most Popular Open Source Database 25
  • 26. Folksonomy Concepts in SQL Copyright MySQL AB The World’s Most Popular Open Source Database 26
  • 27. Here we see tagging and folksonomy together: the user dimension... Copyright MySQL AB The World’s Most Popular Open Source Database 27
  • 28. Folksonomy Adds The User Dimension • Adding the user dimension to our schema • The tag is the relationship glue between the user and item dimensions CREATE TABLE UserTagPost ( user_id INT UNSIGNED NOT NULL , tag_id INT UNSIGNED NOT NULL , post_id INT UNSIGNED NOT NULL , PRIMARY KEY (user_id, tag_id, post_id) , INDEX (tag_id) , INDEX (post_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 28
  • 29. Who Shares My Interest Directly? • Find out the users who have linked to the same item I have • Direct link; we don't go through the tag glue SELECT user_id  FROM UserTagPost WHERE post_id = @my_post_id; Copyright MySQL AB The World’s Most Popular Open Source Database 29
  • 30. Who Shares My Interests Indirectly? • Find out the users who have similar tag sets... • But, how much matching do we want to do? In other words, what radius do we want to match on... • The first step is to find my tags that are within the search radius... this yields my “top” or most popular tags SELECT tag_id  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius; Copyright MySQL AB The World’s Most Popular Open Source Database 30
  • 31. Who Shares My Interests (cont'd) • Now that we have our “top” tag set, we want to find users who match all of our top tags: SELECT others.user_id  FROM UserTagPost others  INNER JOIN (SELECT tag_id  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius) AS my_tags ON others.tag_id = my_tags.tag_id GROUP BY others.user_id; Copyright MySQL AB The World’s Most Popular Open Source Database 31
  • 32. Ranking Ecosystem Matches • What about finding our “closest” ecosystem matches • We can “rank” other users based on whether they have tagged items a number of times similar to ourselves SELECT others.user_id , (COUNT(*) ­ my_tags.num_tags) AS rank FROM UserTagPost others  INNER JOIN ( SELECT tag_id, COUNT(*) AS num_tags  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius ) AS my_tags ON others.tag_id = my_tags.tag_id GROUP BY others.user_id ORDER BY rank DESC; Copyright MySQL AB The World’s Most Popular Open Source Database 32
  • 33. Ranking Ecosystem Matches Efficiently • But... we've still got our COUNT(*) problem • How about another summary table ... CREATE TABLE UserTagStat ( user_id INT UNSIGNED NOT NULL , tag_id INT UNSIGNED NOT NUL , num_posts INT UNSIGNED NOT NULL , PRIMARY KEY (user_id, tag_id) , INDEX (tag_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 33
  • 34. Ranking Ecosystem Matches Efficiently 2 • Hey, we've eliminated the aggregation! SELECT others.user_id , (others.num_posts ­ my_tags.num_posts) AS rank FROM UserTagStat others  INNER JOIN ( SELECT tag_id, num_posts FROM UserTagStat WHERE user_id = @my_user_id AND  num_posts >= @radius ) AS my_tags ON others.tag_id = my_tags.tag_id ORDER BY rank DESC; Copyright MySQL AB The World’s Most Popular Open Source Database 34
  • 35. Scaling Out Sensibly Copyright MySQL AB The World’s Most Popular Open Source Database 35
  • 36. MySQL Replication (Scale Out) • Write to one master Web/App Web/App Web/App • Read from many slaves Server Server Server • Excellent for read intensive apps Load Balancer Reads Reads Writes & Reads … Master Slave Slave MySQL MySQL MySQL Server Server Server Replication Copyright MySQL AB The World’s Most Popular Open Source Database 36
  • 37. Scale Out Using Replication • Master DB stores all writes • Master has InnoDB tables • Slaves handle aggregate reads, non-realtime reads • Web servers can be load balanced (directed) to one or more slaves • Just plug in another slave to increase read performance (that's scaling out...) • Slave can provide hot standby as well as backup server Copyright MySQL AB The World’s Most Popular Open Source Database 37
  • 38. Scale Out Strategies • Slave storage engine can differ from Master! – InnoDB on Master (great update/insert/delete performance) – MyISAM on Slave (fantastic read performance and well as excellent concurrent insert performance, plus can use FULLTEXT indexing) • Push aggregated summary data in batches onto slaves for excellent read performance of semi-static data – Example: “this week's popular tags” • Generate the data via cron job on each slave. No need to burden the master server • Truncate every week Copyright MySQL AB The World’s Most Popular Open Source Database 38
  • 39. Scale Out Strategies (cont'd) • Offload FULLTEXT indexing onto a FT indexer such as Apache Lucene, Mnogosearch, Sphinx FT Engine, etc. • Use Partitioning feature of 5.1 to segment tag data across multiple partitions, allowing you to spread disk load sensibly based on your tag text density • Use the MySQL Query Cache effectively – Use SQL_NO_CACHE when selecting from frequently updated tables (ex: TagStat) – Very effective for high-read environments: can yield 200-250% performance improvement Copyright MySQL AB The World’s Most Popular Open Source Database 39
  • 40. Questions? MySQL Forge http://forge.mysql.com/ MySQL Forge Tag Schema Wiki pages http://forge.mysql.com/wiki/TagSchema PlanetMySQL http://www.planetmysql.org Jay Pipes (jay@mysql.com) MySQL AB Copyright MySQL AB The World’s Most Popular Open Source Database 40