Leveraging MongoDB as a Data Store for Security Data

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,720 views

MongoDB as a Data Store for Security Data
Scaling out the mongod node
Daniel Bauman
Sr. Cyber Intelligence Analyst
LM-CIRT
© 2012 Lockheed Martin Corporation. All Rights Reserved.

Contexts
2
Information
01101100
01101101
01100011
01101111
Influence (Application)
Intelligence
© 2014 Lockheed Martin Corporation. All Rights Reserved.

3 Key Brick Walls
© 2014 Lockheed Martin Corporation. All Rights Reserved.
3
1 • Isolation
2 • Retention
3 • Access

Isolated Information
© 2014 Lockheed Martin Corporation. All Rights Reserved.
4
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111

Isolated Information
© 2014 Lockheed Martin Corporation. All Rights Reserved.
5
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111

Pizza Boxes
© 2014 Lockheed Martin Corporation. All Rights Reserved.
6
✔

Single Pizza Box Throughput
© 2014 Lockheed Martin Corporation. All Rights Reserved.
7
✔

Pizza Boxes
© 2014 Lockheed Martin Corporation. All Rights Reserved.
8
✔

© 2014 Lockheed Martin Corporation. All Rights Reserved.
9
2 • Retention

The Dream – MongoD Standard Install
© 2014 Lockheed Martin Corporation. All Rights Reserved.
10
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec

Data Size vs Documents/sec
The Reality – MongoD Standard Install
© 2014 Lockheed Martin Corporation. All Rights Reserved.
11
Documents per Second
Data Size
File size vs Inserts
Size
time
Documents/sec

The Dream – Data Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
12
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec

Mongo Database
Disk Is FULL
Single Pizza Box Data Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
13
Trash

The Reality – MongoD Capped Collection
© 2014 Lockheed Martin Corporation. All Rights Reserved.
14
Documents per Second
Data Size
File size vs Inserts
Size
time
Documents/sec

© 2014 Lockheed Martin Corporation. All Rights Reserved.
15
3 • Access

The Dream - Querying the Cloud
© 2014 Lockheed Martin Corporation. All Rights Reserved.
16
Query Response
01101100011011010110001101
11000110110101100011010110
01101011011000110110101100
01101011010110001101100011
11000110101101100011011010

And now for something less technical
© 2014 Lockheed Martin Corporation. All Rights Reserved.
17

172.100.178.247
Information Retrieval
172.100.27.143 172.100.164.66 172.100.255.250 172.100.235.24 172.100.195.178 172.100.7.227
172.100.215.227 172.100.31.0 172.100.81.242 172.100.156.25 172.100.139.53 172.100.235.229
172.100.25.137 172.100.171.91 172.100.71.242 172.100.108.64 172.100.96.73 172.100.126.217
172.100.77.25 172.100.214.219 172.100.102.211 172.100.124.176 172.100.96.81 172.100.131.150
172.100.98.250 172.100.178.247 172.100.138.157 172.100.45.67 172.100.122.239
172.100.138.218 172.100.102.110 172.100.49.93 172.100.245.74 172.100.213.39 172.100.80.14
172.100.41.125 172.100.150.202 172.100.1.184 172.100.149.233 172.100.98.83 172.100.199.75
172.100.244.223 172.100.140.69 172.100.187.27 172.100.209.228 172.100.6.249 172.100.60.48
172.100.138.64 172.100.130.181 172.100.188.177 172.100.142.25 172.100.109.79 172.100.70.58
172.100.65.184 172.100.250.150 172.100.215.195 172.100.137.136 172.100.49.64 172.100.148.19
172.100.244.227 172.100.178.131 172.100.255.199 172.100.65.112 172.100.201.249
172.100.53.21 172.100.235.60 172.100.84.205 172.100.16.194 172.100.216.90 172.100.45.88
172.100.240.174 172.100.248.179 172.100.48.70 172.100.8.200 172.100.45.130 172.100.235.59
172.100.171.231 172.100.29.124 172.100.239.204 172.100.172.241 172.100.158.216
172.100.70.109 172.100.227.117 172.100.144.199 172.100.223.36 172.100.166.60 172.100.48.61
172.100.70.76 172.100.51.152 172.100.157.95 172.100.71.133 172.100.0.25 172.100.167.58
172.100.94.133 172.100.93.92 172.100.192.109 172.100.176.25 172.100.169.236 172.100.164.186
© 2014 Lockheed Martin Corporation. All Rights Reserved.
18
“1.0 second is about the limit for the
user’s flow of thought to stay
uninterrupted” – Nielson (1993)
J. Nielsen, "Response times: the three important limits," 1993

Information Retrieval – 10 seconds
© 2014 Lockheed Martin Corporation. All Rights Reserved.
19
1968 R. Miller, "Response time in man-computer conversational transaction,"
“response delays of a standard ten
seconds will not permit the kind of
thinking continuity essential to
sustained problem solving”
– R. Miller(1968)

Diving Back In
© 2014 Lockheed Martin Corporation. All Rights Reserved.
20

Random Data Access
© 2014 Lockheed Martin Corporation. All Rights Reserved.
21
past recent
Documents

Python-MongoR (R for Retention)
Distributed database expansion to MongoDB designed to
optimize scale-out, write intensive document storage
© 2014 Lockheed Martin Corporation. All Rights Reserved.

Data Buckets
© 2014 Lockheed Martin Corporation. All Rights Reserved.
23
past recent
Documents

MongoR Buckets
© 2014 Lockheed Martin Corporation. All Rights Reserved.
24
past recent
DB DB DB DB DB DB

MongoR Automated Segmenting
© 2014 Lockheed Martin Corporation. All Rights Reserved.
25
past recent
DB DB DB DB DB
DB DB DB DB DB
Generator

Mongo
Disk Is Full
Mongo
MongoR Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
26
Trash
Mongo
Mongo Mongo

MongoR
Mongo
MongoR “Capped Collection”
© 2014 Lockheed Martin Corporation. All Rights Reserved.
27
Mongo
Mongo Mongo

MongoR Destructor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
28
past recent
DB DB DB
DB Generator
Destructor

MongoR Destructor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
29
past recent
DB DB DB DB DB DB DB DB
DB DB DB DB DB DB DB DB DB DB DB
DB DB DB DB
Generator

The Real
© 2014 Lockheed Martin Corporation. All Rights Reserved.
30
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec

MongoR Production Behavior.
© 2014 Lockheed Martin Corporation. All Rights Reserved.
31

Best Practices – Bucket Size
Bucket size = ¼ RAM size
© 2014 Lockheed Martin Corporation. All Rights Reserved.
32
System RAM
Mongo Mongo
Mongo Mongo

Best Practices – Bucket Limit
Bucket Limit = 85-90% Capacity
© 2014 Lockheed Martin Corporation. All Rights Reserved.
33
System Drive Capacity

Python-mongor In Production
• MIT Licensed
– https://github.com/lmco/python-mongor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
34

Questions
35 © 2014 Lockheed Martin Corporation. All Rights Reserved.

Leveraging MongoDB as a Data Store for Security Data

Empfohlen

Building Your First App with MongoDB

Building Your First App with MongoDB

Building Your First App with MongoDBMongoDB

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB

Best Practices for MongoDB in Today's Telecommunications Market

Best Practices for MongoDB in Today's Telecommunications Market

Best Practices for MongoDB in Today's Telecommunications MarketMongoDB

Mongo at Sailthru (MongoNYC 2011)

Mongo at Sailthru (MongoNYC 2011)

Mongo at Sailthru (MongoNYC 2011)ibwhite

Webinar: Schema Design and Performance Implications

Webinar: Schema Design and Performance Implications

Webinar: Schema Design and Performance ImplicationsMongoDB

Breaking the oracle tie

Breaking the oracle tie

Breaking the oracle tieagiamas

Indexing and Query Optimizer (Aaron Staple)

Indexing and Query Optimizer (Aaron Staple)

Indexing and Query Optimizer (Aaron Staple)MongoSF

Webinar: Schema Patterns and Your Storage Engine

Webinar: Schema Patterns and Your Storage Engine

Webinar: Schema Patterns and Your Storage EngineMongoDB

Empfohlen

Building Your First App with MongoDB

Building Your First App with MongoDB

Building Your First App with MongoDBMongoDB

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...

MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB

Best Practices for MongoDB in Today's Telecommunications Market

Best Practices for MongoDB in Today's Telecommunications Market

Best Practices for MongoDB in Today's Telecommunications MarketMongoDB

Mongo at Sailthru (MongoNYC 2011)

Mongo at Sailthru (MongoNYC 2011)

Mongo at Sailthru (MongoNYC 2011)ibwhite

Webinar: Schema Design and Performance Implications

Webinar: Schema Design and Performance Implications

Webinar: Schema Design and Performance ImplicationsMongoDB

Breaking the oracle tie

Breaking the oracle tie

Breaking the oracle tieagiamas

Indexing and Query Optimizer (Aaron Staple)

Indexing and Query Optimizer (Aaron Staple)

Indexing and Query Optimizer (Aaron Staple)MongoSF

Webinar: Schema Patterns and Your Storage Engine

Webinar: Schema Patterns and Your Storage Engine

Webinar: Schema Patterns and Your Storage EngineMongoDB

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdfPeter Witsenburg

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstackShuquan Huang

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain DisruptionsMark Morley, MBA

Presentation big data

Presentation big data

Presentation big dataxKinAnx

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise PlanningDan Aldridge, ERP Software Evangelist, LION

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...eFolder

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys MeetupsAngel Alberici

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over InfobloxSolarWinds

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino Hostway|HOSTING

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftTravis Wright

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4jNeo4j

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail SuccessAggregage

Security and Virtualization in the Data Center

Security and Virtualization in the Data Center

Security and Virtualization in the Data CenterCisco Canada

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud OrchestrationDOCOMO Innovations, Inc.

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligenceLeonardo

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspectiveDialogic Inc.

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

Weitere ähnliche Inhalte

Ähnlich wie Leveraging MongoDB as a Data Store for Security Data

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdfPeter Witsenburg

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstackShuquan Huang

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain DisruptionsMark Morley, MBA

Presentation big data

Presentation big data

Presentation big dataxKinAnx

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise PlanningDan Aldridge, ERP Software Evangelist, LION

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...eFolder

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys MeetupsAngel Alberici

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over InfobloxSolarWinds

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino Hostway|HOSTING

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftTravis Wright

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4jNeo4j

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail SuccessAggregage

Security and Virtualization in the Data Center

Security and Virtualization in the Data Center

Security and Virtualization in the Data CenterCisco Canada

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud OrchestrationDOCOMO Innovations, Inc.

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligenceLeonardo

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspectiveDialogic Inc.

Ähnlich wie Leveraging MongoDB as a Data Store for Security Data (20)

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstack

Beware the pitfalls when migrating to hybrid cloud with openstack

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain Disruptions

Adapt to Survive Supply Chain Disruptions

Presentation big data

Presentation big data

Presentation big data

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptx

Partner Briefing_January 25 (FINAL).pptx

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Getting the Most Out of Your Data in the Cloud with Cloudbreak

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino

Event Sponsor ScienceLogic - CTO Antonio Piraino

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

"How overlay networks can make public clouds your global WAN" by Ryan Koop o...

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Cloud-Based Solutions: The Sky Is the Limit for Retail Success

Security and Virtualization in the Data Center

Security and Virtualization in the Data Center

Security and Virtualization in the Data Center

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud Orchestration

Leveraging Multiple Cloud Orchestration

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligence

Le Bourget 2017 - From earth observation to actionable intelligence

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspective

Deploying WebRTC successfully – A web developer perspective

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Kürzlich hochgeladen

Streamlining Python Development: A Guide to a Modern Project Setup

Streamlining Python Development: A Guide to a Modern Project Setup

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Human Factors of XR: Using Human Factors to Design XR Systems

Human Factors of XR: Using Human Factors to Design XR Systems

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Install Stable Diffusion in windows machine

Install Stable Diffusion in windows machine

Install Stable Diffusion in windows machinePadma Pradeep

Are Multi-Cloud and Serverless Good or Bad?

Are Multi-Cloud and Serverless Good or Bad?

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

DMCC Future of Trade Web3 - Special Edition

DMCC Future of Trade Web3 - Special Edition

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Story boards and shot lists for my a level piece

Story boards and shot lists for my a level piece

Story boards and shot lists for my a level piececharlottematthew16

Search Engine Optimization SEO PDF for 2024.pdf

Search Engine Optimization SEO PDF for 2024.pdf

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Nell’iperspazio con Rocket: il Framework Web di Rust!

Nell’iperspazio con Rocket: il Framework Web di Rust!

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdfAddepto

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Vector Databases 101 - An introduction to the world of Vector Databases

Vector Databases 101 - An introduction to the world of Vector Databases

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Kürzlich hochgeladen (20)

Streamlining Python Development: A Guide to a Modern Project Setup

Streamlining Python Development: A Guide to a Modern Project Setup

Streamlining Python Development: A Guide to a Modern Project Setup

Human Factors of XR: Using Human Factors to Design XR Systems

Human Factors of XR: Using Human Factors to Design XR Systems

Human Factors of XR: Using Human Factors to Design XR Systems

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

Install Stable Diffusion in windows machine

Install Stable Diffusion in windows machine

Install Stable Diffusion in windows machine

Are Multi-Cloud and Serverless Good or Bad?

Are Multi-Cloud and Serverless Good or Bad?

Are Multi-Cloud and Serverless Good or Bad?

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

DMCC Future of Trade Web3 - Special Edition

DMCC Future of Trade Web3 - Special Edition

DMCC Future of Trade Web3 - Special Edition

Story boards and shot lists for my a level piece

Story boards and shot lists for my a level piece

Story boards and shot lists for my a level piece

Search Engine Optimization SEO PDF for 2024.pdf

Search Engine Optimization SEO PDF for 2024.pdf

Search Engine Optimization SEO PDF for 2024.pdf

Nell’iperspazio con Rocket: il Framework Web di Rust!

Nell’iperspazio con Rocket: il Framework Web di Rust!

Nell’iperspazio con Rocket: il Framework Web di Rust!

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

Vector Databases 101 - An introduction to the world of Vector Databases

Vector Databases 101 - An introduction to the world of Vector Databases

Vector Databases 101 - An introduction to the world of Vector Databases

WordPress Websites for Engineers: Elevate Your Brand

Leveraging MongoDB as a Data Store for Security Data

1. MongoDB as a Data Store for Security Data Scaling out the mongod node Daniel Bauman Sr. Cyber Intelligence Analyst LM-CIRT © 2012 Lockheed Martin Corporation. All Rights Reserved.

2. Contexts 2 Information 01101100 01101101 01100011 01101111 Influence (Application) Intelligence © 2014 Lockheed Martin Corporation. All Rights Reserved.

3. 3 Key Brick Walls © 2014 Lockheed Martin Corporation. All Rights Reserved. 3 1 • Isolation 2 • Retention 3 • Access

4. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved. 4 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111

5. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved. 5 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111

6. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved. 6 ✔

7. Single Pizza Box Throughput © 2014 Lockheed Martin Corporation. All Rights Reserved. 7 ✔

8. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved. 8 ✔

9. © 2014 Lockheed Martin Corporation. All Rights Reserved. 9 2 • Retention

10. The Dream – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved. 10 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec

11. Data Size vs Documents/sec The Reality – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved. 11 Documents per Second Data Size File size vs Inserts Size time Documents/sec

12. The Dream – Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 12 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec

13. Mongo Database Disk Is FULL Single Pizza Box Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 13 Trash

14. The Reality – MongoD Capped Collection © 2014 Lockheed Martin Corporation. All Rights Reserved. 14 Documents per Second Data Size File size vs Inserts Size time Documents/sec

15. © 2014 Lockheed Martin Corporation. All Rights Reserved. 15 3 • Access

16. The Dream - Querying the Cloud © 2014 Lockheed Martin Corporation. All Rights Reserved. 16 Query Response 01101100011011010110001101 11000110110101100011010110 01101011011000110110101100 01101011010110001101100011 11000110101101100011011010

17. And now for something less technical © 2014 Lockheed Martin Corporation. All Rights Reserved. 17

18. 172.100.178.247 Information Retrieval 172.100.27.143 172.100.164.66 172.100.255.250 172.100.235.24 172.100.195.178 172.100.7.227 172.100.215.227 172.100.31.0 172.100.81.242 172.100.156.25 172.100.139.53 172.100.235.229 172.100.25.137 172.100.171.91 172.100.71.242 172.100.108.64 172.100.96.73 172.100.126.217 172.100.77.25 172.100.214.219 172.100.102.211 172.100.124.176 172.100.96.81 172.100.131.150 172.100.98.250 172.100.178.247 172.100.138.157 172.100.45.67 172.100.122.239 172.100.138.218 172.100.102.110 172.100.49.93 172.100.245.74 172.100.213.39 172.100.80.14 172.100.41.125 172.100.150.202 172.100.1.184 172.100.149.233 172.100.98.83 172.100.199.75 172.100.244.223 172.100.140.69 172.100.187.27 172.100.209.228 172.100.6.249 172.100.60.48 172.100.138.64 172.100.130.181 172.100.188.177 172.100.142.25 172.100.109.79 172.100.70.58 172.100.65.184 172.100.250.150 172.100.215.195 172.100.137.136 172.100.49.64 172.100.148.19 172.100.244.227 172.100.178.131 172.100.255.199 172.100.65.112 172.100.201.249 172.100.53.21 172.100.235.60 172.100.84.205 172.100.16.194 172.100.216.90 172.100.45.88 172.100.240.174 172.100.248.179 172.100.48.70 172.100.8.200 172.100.45.130 172.100.235.59 172.100.171.231 172.100.29.124 172.100.239.204 172.100.172.241 172.100.158.216 172.100.70.109 172.100.227.117 172.100.144.199 172.100.223.36 172.100.166.60 172.100.48.61 172.100.70.76 172.100.51.152 172.100.157.95 172.100.71.133 172.100.0.25 172.100.167.58 172.100.94.133 172.100.93.92 172.100.192.109 172.100.176.25 172.100.169.236 172.100.164.186 © 2014 Lockheed Martin Corporation. All Rights Reserved. 18 “1.0 second is about the limit for the user’s flow of thought to stay uninterrupted” – Nielson (1993) J. Nielsen, "Response times: the three important limits," 1993

19. Information Retrieval – 10 seconds © 2014 Lockheed Martin Corporation. All Rights Reserved. 19 1968 R. Miller, "Response time in man-computer conversational transaction," “response delays of a standard ten seconds will not permit the kind of thinking continuity essential to sustained problem solving” – R. Miller(1968)

20. Diving Back In © 2014 Lockheed Martin Corporation. All Rights Reserved. 20

21. Random Data Access © 2014 Lockheed Martin Corporation. All Rights Reserved. 21 past recent Documents

22. Python-MongoR (R for Retention) Distributed database expansion to MongoDB designed to optimize scale-out, write intensive document storage © 2014 Lockheed Martin Corporation. All Rights Reserved.

23. Data Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved. 23 past recent Documents

24. MongoR Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved. 24 past recent DB DB DB DB DB DB

25. MongoR Automated Segmenting © 2014 Lockheed Martin Corporation. All Rights Reserved. 25 past recent DB DB DB DB DB DB DB DB DB DB Generator

26. Mongo Disk Is Full Mongo MongoR Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 26 Trash Mongo Mongo Mongo

27. MongoR Mongo MongoR “Capped Collection” © 2014 Lockheed Martin Corporation. All Rights Reserved. 27 Mongo Mongo Mongo

28. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved. 28 past recent DB DB DB DB Generator Destructor

29. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved. 29 past recent DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB Generator

30. The Real © 2014 Lockheed Martin Corporation. All Rights Reserved. 30 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec

31. MongoR Production Behavior. © 2014 Lockheed Martin Corporation. All Rights Reserved. 31

32. Best Practices – Bucket Size Bucket size = ¼ RAM size © 2014 Lockheed Martin Corporation. All Rights Reserved. 32 System RAM Mongo Mongo Mongo Mongo

33. Best Practices – Bucket Limit Bucket Limit = 85-90% Capacity © 2014 Lockheed Martin Corporation. All Rights Reserved. 33 System Drive Capacity

34. Python-mongor In Production • MIT Licensed – https://github.com/lmco/python-mongor © 2014 Lockheed Martin Corporation. All Rights Reserved. 34

35. Questions 35 © 2014 Lockheed Martin Corporation. All Rights Reserved.

Hinweis der Redaktion

Scale out a mongod node Senior Cyber Intelligence Analyst with: Lockheed Martin Computer Incident Response Team Network defense for the Lockheed Martin My Background 3 years working in NASA’s Mission Control Center in Houston, TX Mission Focused International Collaboration solving problems 3 years working for Computer Incident Response Team Mission Focused International Collaboration This is a story of lessons learned building a distributed enterprise monitoring framework, specifically the data storage and retrieval subsystem and beating up mongodb until it gave us the performance we wanted. And we got it to work.
What IS: Ability To Extract Information Tap a network Largely Technical Beyond typical NIDS/deep packet inspection Garner Intelligence from that Information Query focused data sets Influence Starts with an Application to apply the intelligence to the information to extract and store metadata. Make actionable decisions (block a malicious email) The whole thing wrapped up is just a tool for a person. At the end of the day, its main job is to Enable Critical Thinking. Do NOT want users simply following a standard process as that hides authority and accountability (the process said so) The Primary focus of this capability is critical thinking enablement. Allow the user to flow uninterrupted thoughts.
Data is everywhere. Bandwidth is limited We don’t have infinite storage, and even if we did, old data doesn’t is better suited as a compressed blob on tape rather than in a queryable data store Which leads to Access. Tools must support the user, and the user both wants and needs a simple way to access the data. I’ll make note that simple does not need necessarily mean easy.
MongoDB scales out Extremely well inside a data center. <click> There’s endless resources available for building a mongodb cluster in a data center One option would be to pull all your information back to the data center <click> However: <next slide>
In in our use case it’s simply unreasonable to pull all information back to the data center. The sheer volume of information would overwhelm the link back to the datacenter. It’s sometimes unreasonable to pull even the METADATA back to the data center Sometimes we have the case where metadata happens to contain more information than the raw data itself. Move your technical influence to the data. For each information source, ship a “pizza box” Store your metadata Locally on that “pizza box” to minimize wan traffic. Make the data available for querying from home base. Don’t levy a requirement that Field Engineers be MongoDB certified DBAs
Continuing the background: Each pizza box looks at information, applies intelligence from another field, and takes appropriate actions. This could be active such as blocking certain network traffic This could be passive such as alerting operators of fishy information transiting a gateway Of course, the action could be generate and store the metadata for later analysis. Inside each one of these pizza boxes is a fully contained system that can process data, drive actions, and most importantly (for this audience) store the schema-less metadata.
Let’s return to the single pizza box WHAT volume of information can a single pizza box support? More importantly, how many boxes do I need to buy to support X amount of data? How fast can I pump the orange circles while still being able to take the minimal “log” action. One way to measure throughput is to count the number of orange circles I can push through the pizza box every second. To be consistent with Mongo terminology, I’ll refer to this as documents per second from here on out. This -number- is highly dependent on an excessive number of configuration details. On purpose, I’ll describe these in relative terms rather than in absolute terms. Obviously, we want to maximize the documents/second the database can support. What is Mongo’s recommendation? The consensus –pause— is that ALL DATA should fit into RAM. --pause-- If you cant fit the data in RAM, you should at least keep your indexes in RAM.
Throw new boxes into the cloud Self Managing: Put each pizza box into the cloud <click> Should be as simple as copy/paste to add new nodes into the cloud <click> The cloud should also allow nodes to come online/offline at will <click> This could be for standard maintenance, or even part of the operational concept. Perhaps there is a node that is intentionally available during sunny hours only <click> I found there was no feasible method to manage this cluster using ‘standard’ mongodb methods. Note “standard” there were tons of interesting ideas floating around the community, this presentation covers one.
In a dream world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. Because I enjoy realistic dreams, I’ll acknowledge that the throughput can’t be infinite. As the total data size goes up <click> As time goes on –point to x-axis— It would be great if the data size --point to y-axis– grew nice and linearly. It would be really nice if the documents per second would stay constant throughout <click> --point to other y axis-- It should be able to maintain a high insert rate regardless of disk size. As I’m sure you expect, the reality isn’t as promising.
Again, read this as a behavior, not a benchmark This curve is a general behavior Think about it! This really isn't that bad, once you get past the initial dropoff, it becomes “roughly” linear. It isn’t exactly what we wanted, but we might be able to work with this. Of course you can alter the data or indexes and change what is meaningful to you, but the general trend will still look like this. Of course you “lift” this up the green line (the document throughput) by adding memory sufficient to fit all data and indexes into RAM. You can use faster spinning disks, splurge for solid state drives, or even look at the more exotic options such as NAND flash. At the end of the day, you are scaling !UP! In order to accomplish the performance increase. You can scale out by making each pizza box a itself a shard cluster with a hand-ful of MongoDs, a config server, and a MongoS, but mongodb won’t let you stack a MongoS on top of another MongoS. You’d have to write your “global” mongos from scratch anyway. MongoDB scales well inside the data center, but not so well in the field. The big deal here is the ever increasing disk size required. We need a simple way to manage actual disk utilization.
Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. As the total data size goes up, and the capped collection kicks in: <click> Remember, this is real utilized disk space and we are running the mongod on real, physical hardware. You must be absolutely, 100% sure the database size always fits in the allocated disk space. There is VERY little room got “accidently” raise the blue line. That fact makes it very difficult to use the TTL collections as you may not know ahead of time the size of the data and when it is appropriate to expire/remove the data while optimizing data retention periods. We’re engineers… we know things in life arent free, so we know something interesting is bound to happen here <click> The point where mongo transforms from growing with the data to overwriting itself. But we arent sure what yet <click> It would be really nice if the documents per second would stay constant. <click> With a little bit of hackery, we got it. BUT… remember, we just settled for the diminished throughput because we couldn’t afford to scale up to the point our data would fit in RAM, making it look more like this <click> It should be able to maintain a high insert rate regardless of disk size, and regardless of retention methods. As I’m sure you expect, the reality isn’t as promising. <click>
Remember, the Mongo database is still tied to actual physical disk space. Lets assume we have a constant stream of information. It is happily filling up the database according to the graph on the previous page. <click> Eventually, the database is going to fill up, what should we do. For lack of a better idea, we might as well grab the the oldest data <Click> And throw them in the trash <click> to make room for the next document. <click> This particular process happily plays itself out over and over again and is actually built right into mongo as a “capped collection” Capped collection also and has a cousin, the TTL index on documents that also manages document retention.
Pause - Absorb As mongod enters the phase where it is managing the roll-off of old data, <click> It experiences a huge penalty “going off the cliff” when capped collection kicks in <click> I don’t intend to deep dive into how MongoDB extents are laid out for the documents and “next” pointers as they exist on disk. But at a high level, Remember for each and every new document or batch coming into the database, mongo may be responsible for overwriting one or more documents from the database. More importantly to this cliff, each and every new document or batch of documents containing an indexed field must be built into Mongo’s B-TREE indexing. EVEN more disastrous, each document or batch of document being trashed must be REMOVED from the B-TREE index. How bad is it that you pay a performance penalty… to permanently REMOVE data from your index? Of course you “lift” this up the green line (the document throughput) by scaling up. Add memory to fit all data in RAM. use faster spinning disks, splurge for solid state, or buy exotic NAND flash. One of the great things about mongo is that is scales out well in the data center, but seems to require scaling UP within the node.
Querying the Cloud: Still want this cloud to look and function like a query to MongoS. <click>One query in, <click> one result set out. Again, this cloud should scale-out like MongoS. However, it needs to perform the scale-out without the nodes talking to, or knowing about each other. If a node is unreachable or unresponsive, the system should SIMPLY tell me about it. Let ME, the user decide what to do with that knowledge and whether to accept the data, or query again at a different time. <click> I envision a result set similar to this, a list of nodes that responded, a list of nodes that didn’t, and the documents matching the query. This is an extremely simplified vision, but nevertheless provides the behavior we desire. It would be even better if the cloud could tell me WHY a node didn’t respond. Until ‘recently’, there wasn’t a good way within MongoS to allow queries to complete if one of the target shards is unresponsive. One “slow” node should not hamper the entire cluster. If it is slow/overloaded, tell me, but don’t hold up results from the rest of the cloud. If we can Achieve this system, we successfully architected a scale-out cloud mongodb cluster
Let’s go back to critical thinking Tools MUST Support The Analyst Let’s think about your standard information retrieval that you might expect out of a O-L-T-P data store in a security context. You get an IP address to think about. Dramatic pause… does everyone have it? Dot 2-4-7? Your O-L-T-P gives you back some data <click> Pause a moment Let that sink in:After 1.0 seconds, you are already beginning to forget WHY! You were looking for this specific address. You moved into turning-the-crank rather than critical thinking Extending that for a moment <click>
Pause for reading 10 seconds. 45 plus years ago we solved the riddle. 10 seconds waiting for an answer and our ability to maintain a critical thinking with a problem solving mindset is lost. 10 seconds. What does this tell us?
In the out-of-the-box mongo world, new data is always streaming into one giant database that holds all the data. As time goes on, mongo happily manages your extents, adding disk size as needed. <click> If you use the capped collection, it allocates all your disk space up front, and happily overwrites old data with new data when the time comes. <click> We are already assuming the data size is greater than RAM size. Probably close to disk size. If you can make the assumption that you are –more-- interested in recent data –this one here at the end-- than you are historical context, there isn’t a great reason to query then entire database, <click> just to get this data. Remember there is some psychological value gained by ensuring the database can field queries within 10 seconds, or better yet, 1 second. Rather than send one query to one database, if we can segment the data into buckets
Again, this hinges on the assumption that whatever you search for, you would want recent results –these up here– returned first. Rather than dispatch the query to the entire database which is far outside ram at this point, you can dispatch the query to each segment. <click> In theory, this most recent bucket is “warmed”, so I’ve noted it in RED. You can make the assumption that this bucket is taking a stream of insert operations. So… it’s fair to assume that the host OS allows it to remain in RAM. The other buckets may not be so lucky and reside only in virtual memory.
I hope we agree, bucketing the data makes sense. What's the best way to technically bucket the data inside mongo? <click> Make each bucket its own database! Here comes the challenge. How do we manage databases as time goes on? We need a way to generate new buckets as the precious bucket fills up. <click>
As time goes on, the generator is responsible for creating new empty buckets <click> One easy way to scale-out mongod is to tightly manage the working set and make sure it stays in RAM. Since only one bucket is “warmed” with the constant stream of inserts, that entire database is the current working set. This means that only a single database, the current “warm” da must remain in RAM, others may go in and out with queries/map reduces at the host OS discretion.
Rather than one giant database, we focused on forcing the entire active bucket or database to stay in memory by making the bucket size smaller than RAM size <click> As each bucket fills up, the generator creates a new empty bucket and begins routing inserts to the new bucket. <click> But remember, we are still tied to physical disk space, and that space will fill up eventually. We need to start trashing data <click> Similar to a capped collection, we should throw away the oldest data first, but rather than throw away 1 or 2 documents at a time, lets throw away the whole bucket to clean up space quickly. <click> We have successfully cleaned up disk space to allocate a new bucket. We need a way to automate this process so we don’t manually curate the buckets. <click>
Eventually, the field of buckets will fill, and the system will enter a stage where it needs to delete old data to make room for new data. As each bucket gets close to full, a destructer will come by and delete the the next oldest bucket to make room for new data. It looks something like this: <click>? It does rely on diskIO, as the database is not “clean” until the drop Database command is executed.
Generator creates new buckets which are big enough to hold a “substantial” amount of data. Substantial is an objective measure that will need to take into account the hardware mongo is running on. This bucket must be small enough so that the entire bucket, data and indexes and all, fits into RAM. It should actually be small enough so that in addition to completely fitting in ram, at least one other bucket can be paged into RAM as well so concurrent queries aren't fighting for physical resources. It should be big enough such that you get a marked improvement in your “capped” collection. Remember, the “floor” of this concept is almost as inefficient as the capped collection. If you make make each bucket big enough to fit one document, the only thing you are really only trading the B-TREE index manipulation for destructor disk IO
Working set Only one database is “heated” with inserts Only one database must be in RAM, others may go in and out with queries/map reduces.
Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. As the total data size goes up, and the capped collection kicks in: <click> Remember, this is real disk backing this, so we must be absolutely 100% sure the database size always fits in the allocated disk space. We don’t have room to “accidently” raise the blue line. Same from before. It would be really nice if the documents per second would stay constant. <click> Remember, we just settled for the diminished throughput because we couldn’t afford to scale up the node to the point our data would fit in RAM, making it look more like this <click> Then we agreed that we really like the capped collection behavior, which made our graph look more like this. <click> Apply a little bit of customization and we help mongo keep the most recent, dare I say the most important data in RAM. We helped Mongo expire data, not on a per document level where it was forced to manage its giant B-TREE index, but on a “bucket” level, allowing Mongo to throw its data to the OS for removal.
Its noisy, but in general it maintains performance regardless of data size and whether it is working to prune data. With only a little bit of effort, we scaled-out a single mongod, without scaling-up any of the physical resources.
The system hums along happily when each bucket is sized somewhere between one-quarter to one-third RAM. This is MongoR implementation specific! Follow standard Mongo practices like fstab parameters and numa control first. Assumes the client application itself isn’t a HEAVY user of RAM. Our implementation has the client application poll MongoR, checking if there is a “new” database handle. If there are many client applications on each box, their polling may not be aligned and whenever that rotate happens, there may be 2 ‘active’ buckets for a period of minutes or hours or depending on how often the client polls for a new database. There needs to be headroom for queries / aggregation / map reduce for the historical data to be pulled up into RAM without kicking the “warm” data out of RAM.
Remember the Saw tooth pattern We set the rotation to occur overnight to have minimal impact around the possibility of 2 active buckets. MongoDB allocates 2GB buckets at a time, so if the system absorbs more than 2GB between rotation checks, it could go “over” the allocated space. We found no limit to the number of buckets. We run about 100 30GB buckets per pizza-box.
What do we want? I don’t think MongoR is the end-all solution to this problem. When I built this system, I got the feeling that a lot of people had this problem, but everyone dealt with it separately. We formed a great relationship with our MongoDB contact who gave us enough hints that we should be concerned with our working set. This behavior is valuable to us, I hope the behavior is valuable to others. The best case scenario is that we convince MongoDB to build a behavior set like this directly into MongoDB so I can abandon my implementation.