SlideShare ist ein Scribd-Unternehmen logo
1 von 16
MongoDB’s New Aggregation Features Chris Westin © Copyright 2010 10gen Inc.
What problem are we solving? Map/Reduce can be used for aggregation… Currently being used for totaling, averaging, etc Map/Reduce is a big hammer Simpler tasks should be easier Shouldn’t need to write JavaScript Avoid the overhead of JavaScript engine We’re seeing requests for help in handling complex documents Select only subdocuments or arrays
How will we solve the problem? Our new aggregation framework Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values Framework:  we can add new operations easily C++ implementation Higher performance than JavaScript
Aggregation - Pipelines Aggregation requests specify a pipeline A pipeline is a series of operations Conceptually, the members of a collection are passed through a pipeline to produce a result Similar to a command-line pipe
Pipeline Operations $match Uses a query predicate (like .find({…})) as a filter $project Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) This can include computed values $group Aggregates items into buckets defined by a key
Computed Expressions Available in $project operations Prefix expression language Add two fields:  $add:[“$field1”, “$field2”] Provide a value for a missing field: $ifnull:[“$field1”, “$field2”] Nesting:  $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] Other functions…. And we can easily add more as required
Projections $project can reshape results $unwind expression doles out array values one at a time Pull fields from nested documents to the top Push fields from the top down into new virtual documents
Grouping $group aggregation expressions Total of column values:  $sum Average of column values: $avg Collect column values in an array:  $push
Demo (See script at https://gist.github.com/993733)
Usage Tips Use $match in a pipeline as early as possible The query optimizer can then be used to choose an index and avoid scanning the entire collection
Driver Support Initial version is a command For any language, build a JSON database object, and execute the command { aggregate : <collection>, pipeline : {…} } Beware of command result size limit
When is this being released? In final development now Expect to see this in the near future
Sharding support Initial release will support sharding Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
Pipeline Operations – Future Plans $sort Sorts the document stream according to a key $out Saves the document stream to a collection Similar to M/R $out, but with sharded output
Expressions – Future Plans Date field extraction Get year, month, day, hour, etc, from Date Date arithmetic
MongoDB Aggregation MongoSF May 2011

Weitere ähnliche Inhalte

Was ist angesagt?

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopHikmat Dhamee
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK hypto
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webAlex Ivanov
 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyRoger Barnes
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious DataBenjamin Shum
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized viewsZach Cox
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupStartit
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...ForgeRock
 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc librarySudip Sengupta
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316radiocats
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010Karoly Negyesi
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 

Was ist angesagt? (20)

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemy
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious Data
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
 
MongoDB
MongoDBMongoDB
MongoDB
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized views
 
9.4json
9.4json9.4json
9.4json
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc library
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 

Andere mochten auch

Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkПрактическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkДенис Кравченко
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 

Andere mochten auch (11)

Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkПрактическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation Framework
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Web Design Trends 2011
Web Design Trends 2011Web Design Trends 2011
Web Design Trends 2011
 
Sharding
ShardingSharding
Sharding
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Grid FS
Grid FSGrid FS
Grid FS
 

Ähnlich wie MongoDB Aggregation MongoSF May 2011

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016Maryann Xue
 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterIvan Shcheklein
 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspectiveAlin Pandichi
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalystdwm042
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 

Ähnlich wie MongoDB Aggregation MongoSF May 2011 (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
 
9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT
 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing Rewriter
 
cheat-sheets.pdf
cheat-sheets.pdfcheat-sheets.pdf
cheat-sheets.pdf
 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspective
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 

Mehr von Chris Westin

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productionengChris Westin
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalChris Westin
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcacheChris Westin
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspeChris Westin
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspeChris Westin
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Chris Westin
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012Chris Westin
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed TechnologyChris Westin
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica setsChris Westin
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionChris Westin
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011Chris Westin
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011Chris Westin
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011Chris Westin
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2Chris Westin
 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooChris Westin
 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahooChris Westin
 

Mehr von Chris Westin (20)

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
 
Gripshort
GripshortGripshort
Gripshort
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspe
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspe
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed Technology
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage Solution
 
FlashCache
FlashCacheFlashCache
FlashCache
 
Large Scale Cacti
Large Scale CactiLarge Scale Cacti
Large Scale Cacti
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2
 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation Yahoo
 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahoo
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

MongoDB Aggregation MongoSF May 2011

  • 1. MongoDB’s New Aggregation Features Chris Westin © Copyright 2010 10gen Inc.
  • 2. What problem are we solving? Map/Reduce can be used for aggregation… Currently being used for totaling, averaging, etc Map/Reduce is a big hammer Simpler tasks should be easier Shouldn’t need to write JavaScript Avoid the overhead of JavaScript engine We’re seeing requests for help in handling complex documents Select only subdocuments or arrays
  • 3. How will we solve the problem? Our new aggregation framework Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values Framework: we can add new operations easily C++ implementation Higher performance than JavaScript
  • 4. Aggregation - Pipelines Aggregation requests specify a pipeline A pipeline is a series of operations Conceptually, the members of a collection are passed through a pipeline to produce a result Similar to a command-line pipe
  • 5. Pipeline Operations $match Uses a query predicate (like .find({…})) as a filter $project Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) This can include computed values $group Aggregates items into buckets defined by a key
  • 6. Computed Expressions Available in $project operations Prefix expression language Add two fields: $add:[“$field1”, “$field2”] Provide a value for a missing field: $ifnull:[“$field1”, “$field2”] Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] Other functions…. And we can easily add more as required
  • 7. Projections $project can reshape results $unwind expression doles out array values one at a time Pull fields from nested documents to the top Push fields from the top down into new virtual documents
  • 8. Grouping $group aggregation expressions Total of column values: $sum Average of column values: $avg Collect column values in an array: $push
  • 9. Demo (See script at https://gist.github.com/993733)
  • 10. Usage Tips Use $match in a pipeline as early as possible The query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 11. Driver Support Initial version is a command For any language, build a JSON database object, and execute the command { aggregate : <collection>, pipeline : {…} } Beware of command result size limit
  • 12. When is this being released? In final development now Expect to see this in the near future
  • 13. Sharding support Initial release will support sharding Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
  • 14. Pipeline Operations – Future Plans $sort Sorts the document stream according to a key $out Saves the document stream to a collection Similar to M/R $out, but with sharded output
  • 15. Expressions – Future Plans Date field extraction Get year, month, day, hour, etc, from Date Date arithmetic