SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Dremel: Interactive Analysis of Web-
Scale Datasets
 Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer,
 Shiva Shivakumar, Matt Tolton, Theo Vassilakis
 Google, Inc.
 VLDB, 2010

 Presentation by ensky (enskylin@gmail.com)
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            2
Introduction
   Dremel is an query system
    For analysis of read-only nested data

   Use case

        Interactive                 Trends
           Tools                   Detection


             Web     Spam         Network
          Dashboards             Optimization
                                            3
DocId: 10
                                           Links


           Features
                                             Forward: 20
                                           Name
                                             Language
                                               Code: 'en-us'
                                               Country: 'us'
             Multi-level structure          Url: 'http://A'
                                           Name

             SQL-like query language
                                             Url: 'http://B'



             Fast: Trillion-row tables in second level
             Big scale: thousands of CPUs and
              petabytes of data
             Widely used by google: in production
              since 2006
SELECT A, COUNT(B) FROM T
GROUP BY A
T = {/gfs/1, /gfs/2, …, /gfs/100000}
                                                               4
Contribution
   This paper presented two major
    technique in Dremel:

   Nested columnar storage
    ◦ store / split into columns
    ◦ Assembly to record
   Query
    ◦ Language
    ◦ Execution
                                     5
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            6
Why nested?
   Flexible
    ◦ Data in web and scientific is often non-
      relational
   Reduce cost
    ◦ Normalizing and recombining nested data
      is often prohibited in TB, PB scale of data.




                                                     7
Why column?
                  DocId: 10
                  Links
                                           column-oriented
                    Forward: 20
                  Name                              A
                                                  *    *
                                                    ...
                    Language
                      Code: 'en-us'
                      Country: 'us'             B          E
                    Url: 'http://A'                   *
                                               C           D
                                                               r1
                  Name
                    Url: 'http://B'

                                          r1
 r1                                                   r1
                                          r2                   r2
 r2                                   Read less,      r2
                                      cheaper
 Record-oriented                      decompression

Challenge: preserve structure, reconstruct from a subset of fields
                                                                 8
DocId: 10
                    Nested data model
Links
  Forward: 20        message Document {
  Forward: 40
  Forward: 60          required int64 DocId;           [1,1]
Name                   optional group Links {
  Language
    Code: 'en-us'
                         repeated int64 Backward;      [0,*]
    Country: 'us'        repeated int64 Forward;
  Language             }
    Code: 'en'
  Url: 'http://A'      repeated group Name {
Name                     repeated group Language {
  Url: 'http://B'
Name                       required string Code;
  Language                 optional string Country;    [0,1]
    Code: 'en-gb'
    Country: 'gb'
                         }
                         optional string Url;
DocId: 20              }
Links                }
  Backward: 10
  Backward: 30
  Forward: 80
Name
                     http://code.google.com/apis/protocolbuffers
  Url: 'http://C'                                                  9
Column-striped representation
       DocId            Name.Url                Links.Forward        Links.Backward
       value    r   d   value       r   d       value   r   d        value    r   d
        10     0    0   http://A    0   2        20     0   2        NULL     0   1
        20     0    0   http://B    1   2        40     1   2            10   0   2

DocId: 10
                         NULL       1   1        60     1   2            30   1   2
Links                   http://C    0   2        80     0   2
  Forward: 20
  Forward: 40
  Forward: 60
Name
  Language                 Name.Language.Code               Name.Language.Country
    Code: 'en-us'
    Country: 'us'           value       r   d               value    r    d
  Language                  en-us       0   2                   us   0    3
    Code: 'en'
  Url: 'http://A'            en         2   2                NULL    2    2
Name
  Url: 'http://B'           NULL        1   1                NULL    1    1
Name
  Language                  en-gb       1   2                   gb   1    3
    Code: 'en-gb'           NULL        0   1                NULL    0    1
    Country: 'gb'                                                                     10
Column-oriented problem
   When query sub-tree, there is no way for
    you to know the whole structure(even the
    position of yourself)             A
                                           *        *
                                       B        ...          E
   To solve this problem,                 *
    this paper presents            C            D
    repetition & definition
                                                        r1
                              r1
                                           r1
                              r2                        r2
           Where am I?
                                           r2
                                                                 11
Repetition and definition levels
                                                                        r
                                                        DocId: 10
                                                        Links             1
                                                          Forward: 20
                                                          Forward: 40
    Name.Language.Code                                    Forward: 60
     value    r   d                                     Name
                             First time, repeat=0         Language
     en-us    0   2                                         Code: 'en-us'
                          Language repeat, level = 2        Country: 'us'
       en     2   2                                       Language
                                                            Code: 'en'
     NULL     1   1                                       Url: 'http://A'
     en-gb    1   2                                     Name
                                                          Url: 'http://B'
     NULL     0   1                                     Name
                                                          Language
                                                            Code: 'en-gb'
                               None repeat, level = 0       Country: 'gb'


                                                        DocId: 20
                                                                         2  r
                                                        Links
                                                          Backward: 10
                                                          Backward: 30
    r: At what repeated field in the field's path         Forward: 80
       the value has repeated                           Name
                                                          Url: 'http://C'12
Repetition and definition levels
                                                                       r
                                                       DocId: 10
                                                       Links             1
                                                         Forward: 20
                                                         Forward: 40
    Name.Language.Country                                Forward: 60
     value    r   d                                    Name
                                                         Language
       us     0   3                                        Code: 'en-us'
                                                           Country: 'us'
      NULL    2   2                                      Language
                                                           Code: 'en'
      NULL    1   1                                      Url: 'http://A'
       gb     1   3                                    Name
                                                         Url: 'http://B'
      NULL    0   1                                    Name
                                                         Language
                                                           Code: 'en-gb'
                                                           Country: 'gb'


                                                       DocId: 20
                                                                        2  r
                                                       Links
                                                         Backward: 10
                                                         Backward: 30
    d: How many fields in paths that could be            Forward: 80
                                                       Name
       undefined (opt. or rep.) are actually present     Url: 'http://C'13
Column-oriented is all right,
but what if I still need record-
oriented query?

                  (e.g., MapReduce)


                                      14
Record assembly FSM
                                   Transitions labeled with
                                   repetition levels
                         DocId
                    0
                            0                     1
     1     Links.Backward        Links.Forward
                            0

                         0,1,2
    Name.Language.Code           Name.Language.Country
                            2
                         Name.Ur         0,1
                1
                             l
                           0



For record-oriented data processing (e.g., MapReduce)
                                                              15
Reading two fields

                                DocId: 10      s 1
                                Name
                                  Language
                DocId               Country: 'us'
                                  Language
                0               Name
                                Name
 1,2    Name.Language.Country     Language
                0                   Country: 'gb'

                                DocId: 20      s2
                                Name



Structure of parent fields is preserved.
Useful for queries like /Name[3]/Language[1]/Country

                                                       16
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            17
Query language
 Based on SQL
 Performs
    ◦   Projection
    ◦   Selection
    ◦   Nested subqueries
    ◦   inner and intra-record aggregation
    ◦   Top-k
    ◦   Joins
    ◦   User-defined functions

                                             18
DocId: 10
                                                  Links
                                                    Forward: 20
                                                    Forward: 40

          Example usage                             Forward: 60
                                                  Name
                                                    Language
                                                      Code: 'en-us'
                                                      Country: 'us'
SELECT DocId AS Id,                                 Language
                                                      Code: 'en'
  COUNT(Name.Language.Code) WITHIN Name AS Cnt,     Url: 'http://A'
                                                  Name
  Name.Url + ',' + Name.Language.Code AS Str        Url: 'http://B'
                                                  Name
FROM t                                              Language
WHERE REGEXP(Name.Url, '^http') AND DocId < 20;       Code: 'en-gb'
                                                      Country: 'gb'



Output table                 Output schema
Id: 10                 t1    message QueryResult {
Name                           required int64 Id;
  Cnt: 2                       repeated group Name {
  Language                       optional uint64 Cnt;
    Str: 'http://A,en-us'        repeated group Language {
    Str: 'http://A,en'             optional string Str;
Name                             }
  Cnt: 0                       }
                             }
                                                                19
Query execution
                   client
                                     • Parallelizes scheduling
root server                            and aggregation
                                     • Fault tolerance
intermediate
                            ...      • query dispatcher can
servers
                                       dispatch servers

leaf servers       ...
(with local                  ...
 storage)


         storage layer (e.g., GFS)                               20
Example: count()
      SELECT A, COUNT(B) FROM T GROUP          SELECT A, SUM(c)
0     BY A                                     FROM (R11 UNION ALL R110)
      T = {/gfs/1, /gfs/2, …, /gfs/100000}     GROUP BY A




      R11                                R12
      SELECT A, COUNT(B) AS c            SELECT A, COUNT(B) AS c
1     FROM T11 GROUP BY A                FROM T12 GROUP BY A
      T11 = {/gfs/1, …, /gfs/10000}      T12 = {/gfs/10001, …, /gfs/20000}

...
      SELECT A, COUNT(B) AS c
3     FROM T31 GROUP BY A             ...
      T31 = {/gfs/1}

                Data access ops                                              21
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            22
Experiment Data
• 1 PB of real data
  (uncompressed, non-replicated)
• 100K-800K tablets per table
• Experiments run during business hours

Table   Number of       Size (unrepl.,   Number       Data     Repl.
name    records         compressed)      of fields    center   factor
  T1      85 billion             87 TB          270     A        3×
  T2      24 billion             13 TB          530     A        3×
  T3        4 billion            70 TB         1200     A        3×
  T4      1+ trillion          105 TB            50     B        3×
  T5      1+ trillion            20 TB           30     B        2×
                                                                        23
Column v.s. Record"cold" time on local disk,
                                  time (sec)                            averaged over 30 runs
                                                                                 (e) parse as
                   from records                                                      C++ objects
10x speedup
using columnar                                       objects
storage                                                                          (d) read +
                                                                                     decompress
                                                               records
                                                                                 (c) parse as
                   from columns




                                           columns
                                                                                     C++ objects
                                                                                 (b) assemble
2-4x overhead of                                                                      records
using records                                                                    (a) read +
                                                                                     decompress

                                                     number of fields


            Table partition: 375 MB (compressed), 300K rows, 125 columns 24
MR and Dremel execution
 Avg # of terms in txtField in 85 billion record table T1
     execution time (sec) on 3000 nodes
                                             Sawzall program ran on MR:

                                             num_recs: table sum of int;
                                             num_words: table sum of int;
                                             emit num_recs <- 1;
                                             emit num_words <-

                                             count_words(input.txtField);

          87 TB        0.5 TB       0.5 TB



Q1: SELECT SUM(count_words(txtField)) / COUNT(*)
    FROM T1

MR overheads: launch jobs, schedule 0.5M tasks, assemble record
                                                           25
Impact of serving tree depth
      execution time (sec)




       (returns 100s of records)   (returns 1M records)


Q2:   SELECT country, SUM(item.amount) FROM T2
      GROUP BY country

Q3:   SELECT domain, SUM(item.amount) FROM T2
      WHERE domain CONTAINS ’.net’
      GROUP BY domain
                                   40 billion nested items26
Scalability
     execution time (sec)




                                        number of
                                        leaf servers




Q5 on a trillion-row table T4:
     SELECT TOP(aid, 20), COUNT(*) FROM T4
                                                 27
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            28
Observation
                          Monthly query workload
                          of one 3000-node
percentage of queries     Dremel instance




                                          execution
                                          time (sec)




     Most queries complete under 10 sec                29
Conclusion
 Dremel is an query system
  For analysis of read-only nested data
 Main feature is fast(interactive response
  time), column-oriented, SQL-like Query
  language
 Introduced two major method:
    ◦ Nested columnar storage – in order to solve
      partial query problem.
    ◦ Query processing – parallel processing &
      distributing, decompressing queries


                                                    30
Comments
 Google is awesome!
 Pros
    ◦ Nested storage gives us more flexibility.
    ◦ repetition & definition is an novel idea
      and it can solve the locality problem
      easily.
    ◦ Distributed serving tree is awesome,
      faster than MR.



                                                  31
Comments
   Cons
    ◦ Read-only may not fit every requirement
    ◦ Dremel is not a database, so you’ll need
      to convert your real data into dremel
      when analyzing
      Converting may cost lost of time and space
      Google doesn’t care this problem, they have
       GFS and many servers.




                                                     32
Thank you for listening.




                           33

Weitere ähnliche Inhalte

Was ist angesagt?

Relational database
Relational database Relational database
Relational database Megha Sharma
 
Database : Relational Data Model
Database : Relational Data ModelDatabase : Relational Data Model
Database : Relational Data ModelSmriti Jain
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Introduction to database & sql
Introduction to database & sqlIntroduction to database & sql
Introduction to database & sqlzahid6
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Treessathish sak
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptSubrata Kumer Paul
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
Data definition language
Data definition languageData definition language
Data definition languageVENNILAV6
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons Provectus
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 

Was ist angesagt? (20)

Relational database
Relational database Relational database
Relational database
 
Database : Relational Data Model
Database : Relational Data ModelDatabase : Relational Data Model
Database : Relational Data Model
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
RDD
RDDRDD
RDD
 
Introduction to database & sql
Introduction to database & sqlIntroduction to database & sql
Introduction to database & sql
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text summarization
Text summarizationText summarization
Text summarization
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Introduction to triggers
Introduction to triggersIntroduction to triggers
Introduction to triggers
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
 
Apache spark
Apache sparkApache spark
Apache spark
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Data definition language
Data definition languageData definition language
Data definition language
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with ExamplesDML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
 
Database
DatabaseDatabase
Database
 

Ähnlich wie Dremel: interactive analysis of web-scale datasets

Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017techmaddy
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
Polygot persistence for Java Developers - August 2011 / @Oakjug
Polygot persistence for Java Developers - August 2011 / @OakjugPolygot persistence for Java Developers - August 2011 / @Oakjug
Polygot persistence for Java Developers - August 2011 / @OakjugChris Richardson
 
Programming Design Guidelines
Programming Design GuidelinesProgramming Design Guidelines
Programming Design Guidelinesintuitiv.de
 
Using RDA for Archives and Manuscripts
Using RDA for Archives and ManuscriptsUsing RDA for Archives and Manuscripts
Using RDA for Archives and ManuscriptsAdrienne Pruitt
 
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Practical Implementation of Space-Efficient Dynamic Keyword DictionariesPractical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Practical Implementation of Space-Efficient Dynamic Keyword DictionariesShunsuke Kanda
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
Keg.io
Keg.ioKeg.io
Keg.ioHeroku
 
Polyglot persistence for Java developers - moving out of the relational comfo...
Polyglot persistence for Java developers - moving out of the relational comfo...Polyglot persistence for Java developers - moving out of the relational comfo...
Polyglot persistence for Java developers - moving out of the relational comfo...Chris Richardson
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetupvmoorthy
 

Ähnlich wie Dremel: interactive analysis of web-scale datasets (20)

Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Seminar_Final
Seminar_FinalSeminar_Final
Seminar_Final
 
Polygot persistence for Java Developers - August 2011 / @Oakjug
Polygot persistence for Java Developers - August 2011 / @OakjugPolygot persistence for Java Developers - August 2011 / @Oakjug
Polygot persistence for Java Developers - August 2011 / @Oakjug
 
Programming Design Guidelines
Programming Design GuidelinesProgramming Design Guidelines
Programming Design Guidelines
 
Network and DNS Vulnerabilities
Network and DNS VulnerabilitiesNetwork and DNS Vulnerabilities
Network and DNS Vulnerabilities
 
Using RDA for Archives and Manuscripts
Using RDA for Archives and ManuscriptsUsing RDA for Archives and Manuscripts
Using RDA for Archives and Manuscripts
 
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Practical Implementation of Space-Efficient Dynamic Keyword DictionariesPractical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Keg.io
Keg.ioKeg.io
Keg.io
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
Polyglot persistence for Java developers - moving out of the relational comfo...
Polyglot persistence for Java developers - moving out of the relational comfo...Polyglot persistence for Java developers - moving out of the relational comfo...
Polyglot persistence for Java developers - moving out of the relational comfo...
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
 
C++primer
C++primerC++primer
C++primer
 
Avro
AvroAvro
Avro
 
2011 09-pdfjs
2011 09-pdfjs2011 09-pdfjs
2011 09-pdfjs
 
Ruby DSL
Ruby DSLRuby DSL
Ruby DSL
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 

Mehr von Hung-yu Lin

2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - phpHung-yu Lin
 
2014 database - course 3 - PHP and MySQL
2014 database - course 3 - PHP and MySQL2014 database - course 3 - PHP and MySQL
2014 database - course 3 - PHP and MySQLHung-yu Lin
 
2014 database - course 1 - www introduction
2014 database - course 1 - www introduction2014 database - course 1 - www introduction
2014 database - course 1 - www introductionHung-yu Lin
 
OpenWebSchool - 11 - CodeIgniter
OpenWebSchool - 11 - CodeIgniterOpenWebSchool - 11 - CodeIgniter
OpenWebSchool - 11 - CodeIgniterHung-yu Lin
 
OpenWebSchool - 06 - PHP + MySQL
OpenWebSchool - 06 - PHP + MySQLOpenWebSchool - 06 - PHP + MySQL
OpenWebSchool - 06 - PHP + MySQLHung-yu Lin
 
OpenWebSchool - 05 - MySQL
OpenWebSchool - 05 - MySQLOpenWebSchool - 05 - MySQL
OpenWebSchool - 05 - MySQLHung-yu Lin
 
OpenWebSchool - 02 - PHP Part I
OpenWebSchool - 02 - PHP Part IOpenWebSchool - 02 - PHP Part I
OpenWebSchool - 02 - PHP Part IHung-yu Lin
 
OpenWebSchool - 01 - WWW Intro
OpenWebSchool - 01 - WWW IntroOpenWebSchool - 01 - WWW Intro
OpenWebSchool - 01 - WWW IntroHung-yu Lin
 
OpenWebSchool - 03 - PHP Part II
OpenWebSchool - 03 - PHP Part IIOpenWebSchool - 03 - PHP Part II
OpenWebSchool - 03 - PHP Part IIHung-yu Lin
 
Google App Engine
Google App EngineGoogle App Engine
Google App EngineHung-yu Lin
 

Mehr von Hung-yu Lin (11)

2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - php
 
2014 database - course 3 - PHP and MySQL
2014 database - course 3 - PHP and MySQL2014 database - course 3 - PHP and MySQL
2014 database - course 3 - PHP and MySQL
 
2014 database - course 1 - www introduction
2014 database - course 1 - www introduction2014 database - course 1 - www introduction
2014 database - course 1 - www introduction
 
OpenWebSchool - 11 - CodeIgniter
OpenWebSchool - 11 - CodeIgniterOpenWebSchool - 11 - CodeIgniter
OpenWebSchool - 11 - CodeIgniter
 
OpenWebSchool - 06 - PHP + MySQL
OpenWebSchool - 06 - PHP + MySQLOpenWebSchool - 06 - PHP + MySQL
OpenWebSchool - 06 - PHP + MySQL
 
OpenWebSchool - 05 - MySQL
OpenWebSchool - 05 - MySQLOpenWebSchool - 05 - MySQL
OpenWebSchool - 05 - MySQL
 
OpenWebSchool - 02 - PHP Part I
OpenWebSchool - 02 - PHP Part IOpenWebSchool - 02 - PHP Part I
OpenWebSchool - 02 - PHP Part I
 
OpenWebSchool - 01 - WWW Intro
OpenWebSchool - 01 - WWW IntroOpenWebSchool - 01 - WWW Intro
OpenWebSchool - 01 - WWW Intro
 
OpenWebSchool - 03 - PHP Part II
OpenWebSchool - 03 - PHP Part IIOpenWebSchool - 03 - PHP Part II
OpenWebSchool - 03 - PHP Part II
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Redis
RedisRedis
Redis
 

Kürzlich hochgeladen

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Dremel: interactive analysis of web-scale datasets

  • 1. Dremel: Interactive Analysis of Web- Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Google, Inc. VLDB, 2010 Presentation by ensky (enskylin@gmail.com)
  • 2. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 2
  • 3. Introduction  Dremel is an query system For analysis of read-only nested data  Use case Interactive Trends Tools Detection Web Spam Network Dashboards Optimization 3
  • 4. DocId: 10 Links Features Forward: 20 Name Language Code: 'en-us' Country: 'us'  Multi-level structure Url: 'http://A' Name  SQL-like query language Url: 'http://B'  Fast: Trillion-row tables in second level  Big scale: thousands of CPUs and petabytes of data  Widely used by google: in production since 2006 SELECT A, COUNT(B) FROM T GROUP BY A T = {/gfs/1, /gfs/2, …, /gfs/100000} 4
  • 5. Contribution  This paper presented two major technique in Dremel:  Nested columnar storage ◦ store / split into columns ◦ Assembly to record  Query ◦ Language ◦ Execution 5
  • 6. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 6
  • 7. Why nested?  Flexible ◦ Data in web and scientific is often non- relational  Reduce cost ◦ Normalizing and recombining nested data is often prohibited in TB, PB scale of data. 7
  • 8. Why column? DocId: 10 Links column-oriented Forward: 20 Name A * * ... Language Code: 'en-us' Country: 'us' B E Url: 'http://A' * C D r1 Name Url: 'http://B' r1 r1 r1 r2 r2 r2 Read less, r2 cheaper Record-oriented decompression Challenge: preserve structure, reconstruct from a subset of fields 8
  • 9. DocId: 10 Nested data model Links Forward: 20 message Document { Forward: 40 Forward: 60 required int64 DocId; [1,1] Name optional group Links { Language Code: 'en-us' repeated int64 Backward; [0,*] Country: 'us' repeated int64 Forward; Language } Code: 'en' Url: 'http://A' repeated group Name { Name repeated group Language { Url: 'http://B' Name required string Code; Language optional string Country; [0,1] Code: 'en-gb' Country: 'gb' } optional string Url; DocId: 20 } Links } Backward: 10 Backward: 30 Forward: 80 Name http://code.google.com/apis/protocolbuffers Url: 'http://C' 9
  • 10. Column-striped representation DocId Name.Url Links.Forward Links.Backward value r d value r d value r d value r d 10 0 0 http://A 0 2 20 0 2 NULL 0 1 20 0 0 http://B 1 2 40 1 2 10 0 2 DocId: 10 NULL 1 1 60 1 2 30 1 2 Links http://C 0 2 80 0 2 Forward: 20 Forward: 40 Forward: 60 Name Language Name.Language.Code Name.Language.Country Code: 'en-us' Country: 'us' value r d value r d Language en-us 0 2 us 0 3 Code: 'en' Url: 'http://A' en 2 2 NULL 2 2 Name Url: 'http://B' NULL 1 1 NULL 1 1 Name Language en-gb 1 2 gb 1 3 Code: 'en-gb' NULL 0 1 NULL 0 1 Country: 'gb' 10
  • 11. Column-oriented problem  When query sub-tree, there is no way for you to know the whole structure(even the position of yourself) A * * B ... E  To solve this problem, * this paper presents C D repetition & definition r1 r1 r1 r2 r2 Where am I? r2 11
  • 12. Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Code Forward: 60 value r d Name First time, repeat=0 Language en-us 0 2 Code: 'en-us' Language repeat, level = 2 Country: 'us' en 2 2 Language Code: 'en' NULL 1 1 Url: 'http://A' en-gb 1 2 Name Url: 'http://B' NULL 0 1 Name Language Code: 'en-gb' None repeat, level = 0 Country: 'gb' DocId: 20 2 r Links Backward: 10 Backward: 30 r: At what repeated field in the field's path Forward: 80 the value has repeated Name Url: 'http://C'12
  • 13. Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Country Forward: 60 value r d Name Language us 0 3 Code: 'en-us' Country: 'us' NULL 2 2 Language Code: 'en' NULL 1 1 Url: 'http://A' gb 1 3 Name Url: 'http://B' NULL 0 1 Name Language Code: 'en-gb' Country: 'gb' DocId: 20 2 r Links Backward: 10 Backward: 30 d: How many fields in paths that could be Forward: 80 Name undefined (opt. or rep.) are actually present Url: 'http://C'13
  • 14. Column-oriented is all right, but what if I still need record- oriented query? (e.g., MapReduce) 14
  • 15. Record assembly FSM Transitions labeled with repetition levels DocId 0 0 1 1 Links.Backward Links.Forward 0 0,1,2 Name.Language.Code Name.Language.Country 2 Name.Ur 0,1 1 l 0 For record-oriented data processing (e.g., MapReduce) 15
  • 16. Reading two fields DocId: 10 s 1 Name Language DocId Country: 'us' Language 0 Name Name 1,2 Name.Language.Country Language 0 Country: 'gb' DocId: 20 s2 Name Structure of parent fields is preserved. Useful for queries like /Name[3]/Language[1]/Country 16
  • 17. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 17
  • 18. Query language  Based on SQL  Performs ◦ Projection ◦ Selection ◦ Nested subqueries ◦ inner and intra-record aggregation ◦ Top-k ◦ Joins ◦ User-defined functions 18
  • 19. DocId: 10 Links Forward: 20 Forward: 40 Example usage Forward: 60 Name Language Code: 'en-us' Country: 'us' SELECT DocId AS Id, Language Code: 'en' COUNT(Name.Language.Code) WITHIN Name AS Cnt, Url: 'http://A' Name Name.Url + ',' + Name.Language.Code AS Str Url: 'http://B' Name FROM t Language WHERE REGEXP(Name.Url, '^http') AND DocId < 20; Code: 'en-gb' Country: 'gb' Output table Output schema Id: 10 t1 message QueryResult { Name required int64 Id; Cnt: 2 repeated group Name { Language optional uint64 Cnt; Str: 'http://A,en-us' repeated group Language { Str: 'http://A,en' optional string Str; Name } Cnt: 0 } } 19
  • 20. Query execution client • Parallelizes scheduling root server and aggregation • Fault tolerance intermediate ... • query dispatcher can servers dispatch servers leaf servers ... (with local ... storage) storage layer (e.g., GFS) 20
  • 21. Example: count() SELECT A, COUNT(B) FROM T GROUP SELECT A, SUM(c) 0 BY A FROM (R11 UNION ALL R110) T = {/gfs/1, /gfs/2, …, /gfs/100000} GROUP BY A R11 R12 SELECT A, COUNT(B) AS c SELECT A, COUNT(B) AS c 1 FROM T11 GROUP BY A FROM T12 GROUP BY A T11 = {/gfs/1, …, /gfs/10000} T12 = {/gfs/10001, …, /gfs/20000} ... SELECT A, COUNT(B) AS c 3 FROM T31 GROUP BY A ... T31 = {/gfs/1} Data access ops 21
  • 22. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 22
  • 23. Experiment Data • 1 PB of real data (uncompressed, non-replicated) • 100K-800K tablets per table • Experiments run during business hours Table Number of Size (unrepl., Number Data Repl. name records compressed) of fields center factor T1 85 billion 87 TB 270 A 3× T2 24 billion 13 TB 530 A 3× T3 4 billion 70 TB 1200 A 3× T4 1+ trillion 105 TB 50 B 3× T5 1+ trillion 20 TB 30 B 2× 23
  • 24. Column v.s. Record"cold" time on local disk, time (sec) averaged over 30 runs (e) parse as from records C++ objects 10x speedup using columnar objects storage (d) read + decompress records (c) parse as from columns columns C++ objects (b) assemble 2-4x overhead of records using records (a) read + decompress number of fields Table partition: 375 MB (compressed), 300K rows, 125 columns 24
  • 25. MR and Dremel execution Avg # of terms in txtField in 85 billion record table T1 execution time (sec) on 3000 nodes Sawzall program ran on MR: num_recs: table sum of int; num_words: table sum of int; emit num_recs <- 1; emit num_words <- count_words(input.txtField); 87 TB 0.5 TB 0.5 TB Q1: SELECT SUM(count_words(txtField)) / COUNT(*) FROM T1 MR overheads: launch jobs, schedule 0.5M tasks, assemble record 25
  • 26. Impact of serving tree depth execution time (sec) (returns 100s of records) (returns 1M records) Q2: SELECT country, SUM(item.amount) FROM T2 GROUP BY country Q3: SELECT domain, SUM(item.amount) FROM T2 WHERE domain CONTAINS ’.net’ GROUP BY domain 40 billion nested items26
  • 27. Scalability execution time (sec) number of leaf servers Q5 on a trillion-row table T4: SELECT TOP(aid, 20), COUNT(*) FROM T4 27
  • 28. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 28
  • 29. Observation Monthly query workload of one 3000-node percentage of queries Dremel instance execution time (sec) Most queries complete under 10 sec 29
  • 30. Conclusion  Dremel is an query system For analysis of read-only nested data  Main feature is fast(interactive response time), column-oriented, SQL-like Query language  Introduced two major method: ◦ Nested columnar storage – in order to solve partial query problem. ◦ Query processing – parallel processing & distributing, decompressing queries 30
  • 31. Comments  Google is awesome!  Pros ◦ Nested storage gives us more flexibility. ◦ repetition & definition is an novel idea and it can solve the locality problem easily. ◦ Distributed serving tree is awesome, faster than MR. 31
  • 32. Comments  Cons ◦ Read-only may not fit every requirement ◦ Dremel is not a database, so you’ll need to convert your real data into dremel when analyzing  Converting may cost lost of time and space  Google doesn’t care this problem, they have GFS and many servers. 32
  • 33. Thank you for listening. 33