SlideShare ist ein Scribd-Unternehmen logo
1 von 36
MongoDB as a Data Store for Security Data
Scaling out the mongod node
Daniel Bauman
Sr. Cyber Intelligence Analyst
LM-CIRT
© 2012 Lockheed Martin Corporation. All Rights Reserved.
Contexts
2
Information
01101100
01101101
01100011
01101111
Influence (Application)
Intelligence
© 2014 Lockheed Martin Corporation. All Rights Reserved.
3 Key Brick Walls
© 2014 Lockheed Martin Corporation. All Rights Reserved.
3
1 • Isolation
2 • Retention
3 • Access
Isolated Information
© 2014 Lockheed Martin Corporation. All Rights Reserved.
4
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
Isolated Information
© 2014 Lockheed Martin Corporation. All Rights Reserved.
5
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
01101100
01101101
01100011
01101111
Pizza Boxes
© 2014 Lockheed Martin Corporation. All Rights Reserved.
6
✔
Single Pizza Box Throughput
© 2014 Lockheed Martin Corporation. All Rights Reserved.
7
✔
Pizza Boxes
© 2014 Lockheed Martin Corporation. All Rights Reserved.
8
✔
© 2014 Lockheed Martin Corporation. All Rights Reserved.
9
2 • Retention
The Dream – MongoD Standard Install
© 2014 Lockheed Martin Corporation. All Rights Reserved.
10
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec
Data Size vs Documents/sec
The Reality – MongoD Standard Install
© 2014 Lockheed Martin Corporation. All Rights Reserved.
11
Documents per Second
Data Size
File size vs Inserts
Size
time
Documents/sec
The Dream – Data Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
12
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec
Mongo Database
Disk Is FULL
Single Pizza Box Data Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
13
Trash
The Reality – MongoD Capped Collection
© 2014 Lockheed Martin Corporation. All Rights Reserved.
14
Documents per Second
Data Size
File size vs Inserts
Size
time
Documents/sec
© 2014 Lockheed Martin Corporation. All Rights Reserved.
15
3 • Access
The Dream - Querying the Cloud
© 2014 Lockheed Martin Corporation. All Rights Reserved.
16
Query Response
01101100011011010110001101
11000110110101100011010110
01101011011000110110101100
01101011010110001101100011
11000110101101100011011010
And now for something less technical
© 2014 Lockheed Martin Corporation. All Rights Reserved.
17
172.100.178.247
Information Retrieval
172.100.27.143 172.100.164.66 172.100.255.250 172.100.235.24 172.100.195.178 172.100.7.227
172.100.215.227 172.100.31.0 172.100.81.242 172.100.156.25 172.100.139.53 172.100.235.229
172.100.25.137 172.100.171.91 172.100.71.242 172.100.108.64 172.100.96.73 172.100.126.217
172.100.77.25 172.100.214.219 172.100.102.211 172.100.124.176 172.100.96.81 172.100.131.150
172.100.98.250 172.100.178.247 172.100.138.157 172.100.45.67 172.100.122.239
172.100.138.218 172.100.102.110 172.100.49.93 172.100.245.74 172.100.213.39 172.100.80.14
172.100.41.125 172.100.150.202 172.100.1.184 172.100.149.233 172.100.98.83 172.100.199.75
172.100.244.223 172.100.140.69 172.100.187.27 172.100.209.228 172.100.6.249 172.100.60.48
172.100.138.64 172.100.130.181 172.100.188.177 172.100.142.25 172.100.109.79 172.100.70.58
172.100.65.184 172.100.250.150 172.100.215.195 172.100.137.136 172.100.49.64 172.100.148.19
172.100.244.227 172.100.178.131 172.100.255.199 172.100.65.112 172.100.201.249
172.100.53.21 172.100.235.60 172.100.84.205 172.100.16.194 172.100.216.90 172.100.45.88
172.100.240.174 172.100.248.179 172.100.48.70 172.100.8.200 172.100.45.130 172.100.235.59
172.100.171.231 172.100.29.124 172.100.239.204 172.100.172.241 172.100.158.216
172.100.70.109 172.100.227.117 172.100.144.199 172.100.223.36 172.100.166.60 172.100.48.61
172.100.70.76 172.100.51.152 172.100.157.95 172.100.71.133 172.100.0.25 172.100.167.58
172.100.94.133 172.100.93.92 172.100.192.109 172.100.176.25 172.100.169.236 172.100.164.186
© 2014 Lockheed Martin Corporation. All Rights Reserved.
18
“1.0 second is about the limit for the
user’s flow of thought to stay
uninterrupted” – Nielson (1993)
J. Nielsen, "Response times: the three important limits," 1993
Information Retrieval – 10 seconds
© 2014 Lockheed Martin Corporation. All Rights Reserved.
19
1968 R. Miller, "Response time in man-computer conversational transaction,"
“response delays of a standard ten
seconds will not permit the kind of
thinking continuity essential to
sustained problem solving”
– R. Miller(1968)
Diving Back In
© 2014 Lockheed Martin Corporation. All Rights Reserved.
20
Random Data Access
© 2014 Lockheed Martin Corporation. All Rights Reserved.
21
past recent
Documents
Python-MongoR (R for Retention)
Distributed database expansion to MongoDB designed to
optimize scale-out, write intensive document storage
© 2014 Lockheed Martin Corporation. All Rights Reserved.
Data Buckets
© 2014 Lockheed Martin Corporation. All Rights Reserved.
23
past recent
Documents
MongoR Buckets
© 2014 Lockheed Martin Corporation. All Rights Reserved.
24
past recent
DB DB DB DB DB DB
MongoR Automated Segmenting
© 2014 Lockheed Martin Corporation. All Rights Reserved.
25
past recent
DB DB DB DB DB
DB DB DB DB DB
Generator
Mongo
Disk Is Full
Mongo
MongoR Retention
© 2014 Lockheed Martin Corporation. All Rights Reserved.
26
Trash
Mongo
Mongo Mongo
MongoR
Mongo
MongoR “Capped Collection”
© 2014 Lockheed Martin Corporation. All Rights Reserved.
27
Mongo
Mongo Mongo
MongoR Destructor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
28
past recent
DB DB DB
DB Generator
Destructor
MongoR Destructor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
29
past recent
DB DB DB DB DB DB DB DB
DB DB DB DB DB DB DB DB DB DB DB
DB DB DB DB
Generator
The Real
© 2014 Lockheed Martin Corporation. All Rights Reserved.
30
Documents per Second
Data Size
Data Size vs Documents/sec
Size
time
Documents/sec
MongoR Production Behavior.
© 2014 Lockheed Martin Corporation. All Rights Reserved.
31
Best Practices – Bucket Size
Bucket size = ¼ RAM size
© 2014 Lockheed Martin Corporation. All Rights Reserved.
32
System RAM
Mongo Mongo
Mongo Mongo
Best Practices – Bucket Limit
Bucket Limit = 85-90% Capacity
© 2014 Lockheed Martin Corporation. All Rights Reserved.
33
System Drive Capacity
Python-mongor In Production
• MIT Licensed
– https://github.com/lmco/python-mongor
© 2014 Lockheed Martin Corporation. All Rights Reserved.
34
Questions
35 © 2014 Lockheed Martin Corporation. All Rights Reserved.
Leveraging MongoDB as a Data Store for Security Data

Weitere ähnliche Inhalte

Ähnlich wie Leveraging MongoDB as a Data Store for Security Data

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf
Cloud Trends - Cloud 2.0 - Living on the Edge.pdfCloud Trends - Cloud 2.0 - Living on the Edge.pdf
Cloud Trends - Cloud 2.0 - Living on the Edge.pdfPeter Witsenburg
 
Beware the pitfalls when migrating to hybrid cloud with openstack
Beware the pitfalls when migrating to hybrid cloud with openstackBeware the pitfalls when migrating to hybrid cloud with openstack
Beware the pitfalls when migrating to hybrid cloud with openstackShuquan Huang
 
Adapt to Survive Supply Chain Disruptions
Adapt to Survive Supply Chain DisruptionsAdapt to Survive Supply Chain Disruptions
Adapt to Survive Supply Chain DisruptionsMark Morley, MBA
 
Presentation big data
Presentation   big dataPresentation   big data
Presentation big dataxKinAnx
 
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...eFolder
 
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder
 
VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups
VPCs, Metrics Framework, Back pressure  : MuleSoft Virtual Muleys MeetupsVPCs, Metrics Framework, Back pressure  : MuleSoft Virtual Muleys Meetups
VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys MeetupsAngel Alberici
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox
Top 5 Reasons To Consider SolarWinds IPAM Over InfobloxTop 5 Reasons To Consider SolarWinds IPAM Over Infoblox
Top 5 Reasons To Consider SolarWinds IPAM Over InfobloxSolarWinds
 
Event Sponsor ScienceLogic - CTO Antonio Piraino
Event Sponsor ScienceLogic - CTO Antonio Piraino Event Sponsor ScienceLogic - CTO Antonio Piraino
Event Sponsor ScienceLogic - CTO Antonio Piraino Hostway|HOSTING
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftTravis Wright
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks
 
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4jNeo4j
 
Cloud-Based Solutions: The Sky Is the Limit for Retail Success
Cloud-Based Solutions: The Sky Is the Limit for Retail SuccessCloud-Based Solutions: The Sky Is the Limit for Retail Success
Cloud-Based Solutions: The Sky Is the Limit for Retail SuccessAggregage
 
Security and Virtualization in the Data Center
Security and Virtualization in the Data CenterSecurity and Virtualization in the Data Center
Security and Virtualization in the Data CenterCisco Canada
 
Le Bourget 2017 - From earth observation to actionable intelligence
Le Bourget 2017 - From earth observation to actionable intelligenceLe Bourget 2017 - From earth observation to actionable intelligence
Le Bourget 2017 - From earth observation to actionable intelligenceLeonardo
 
Deploying WebRTC successfully – A web developer perspective
Deploying WebRTC successfully – A web developer perspectiveDeploying WebRTC successfully – A web developer perspective
Deploying WebRTC successfully – A web developer perspectiveDialogic Inc.
 

Ähnlich wie Leveraging MongoDB as a Data Store for Security Data (20)

Cloud Trends - Cloud 2.0 - Living on the Edge.pdf
Cloud Trends - Cloud 2.0 - Living on the Edge.pdfCloud Trends - Cloud 2.0 - Living on the Edge.pdf
Cloud Trends - Cloud 2.0 - Living on the Edge.pdf
 
Beware the pitfalls when migrating to hybrid cloud with openstack
Beware the pitfalls when migrating to hybrid cloud with openstackBeware the pitfalls when migrating to hybrid cloud with openstack
Beware the pitfalls when migrating to hybrid cloud with openstack
 
Adapt to Survive Supply Chain Disruptions
Adapt to Survive Supply Chain DisruptionsAdapt to Survive Supply Chain Disruptions
Adapt to Survive Supply Chain Disruptions
 
Presentation big data
Presentation   big dataPresentation   big data
Presentation big data
 
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise PlanningInforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
 
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...
eFolder Webinar — Special eFolder Announcement: StorageCraft Agreement and CE...
 
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
 
VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups
VPCs, Metrics Framework, Back pressure  : MuleSoft Virtual Muleys MeetupsVPCs, Metrics Framework, Back pressure  : MuleSoft Virtual Muleys Meetups
VPCs, Metrics Framework, Back pressure : MuleSoft Virtual Muleys Meetups
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox
Top 5 Reasons To Consider SolarWinds IPAM Over InfobloxTop 5 Reasons To Consider SolarWinds IPAM Over Infoblox
Top 5 Reasons To Consider SolarWinds IPAM Over Infoblox
 
Event Sponsor ScienceLogic - CTO Antonio Piraino
Event Sponsor ScienceLogic - CTO Antonio Piraino Event Sponsor ScienceLogic - CTO Antonio Piraino
Event Sponsor ScienceLogic - CTO Antonio Piraino
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j
99.9999% (Seriously, that Many 9's) Uptime at Adobe: How We Got There with Neo4j
 
Cloud-Based Solutions: The Sky Is the Limit for Retail Success
Cloud-Based Solutions: The Sky Is the Limit for Retail SuccessCloud-Based Solutions: The Sky Is the Limit for Retail Success
Cloud-Based Solutions: The Sky Is the Limit for Retail Success
 
Security and Virtualization in the Data Center
Security and Virtualization in the Data CenterSecurity and Virtualization in the Data Center
Security and Virtualization in the Data Center
 
Leveraging Multiple Cloud Orchestration
Leveraging Multiple Cloud OrchestrationLeveraging Multiple Cloud Orchestration
Leveraging Multiple Cloud Orchestration
 
Le Bourget 2017 - From earth observation to actionable intelligence
Le Bourget 2017 - From earth observation to actionable intelligenceLe Bourget 2017 - From earth observation to actionable intelligence
Le Bourget 2017 - From earth observation to actionable intelligence
 
Deploying WebRTC successfully – A web developer perspective
Deploying WebRTC successfully – A web developer perspectiveDeploying WebRTC successfully – A web developer perspective
Deploying WebRTC successfully – A web developer perspective
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Leveraging MongoDB as a Data Store for Security Data

  • 1. MongoDB as a Data Store for Security Data Scaling out the mongod node Daniel Bauman Sr. Cyber Intelligence Analyst LM-CIRT © 2012 Lockheed Martin Corporation. All Rights Reserved.
  • 3. 3 Key Brick Walls © 2014 Lockheed Martin Corporation. All Rights Reserved. 3 1 • Isolation 2 • Retention 3 • Access
  • 4. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved. 4 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111
  • 5. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved. 5 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111
  • 6. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved. 6 ✔
  • 7. Single Pizza Box Throughput © 2014 Lockheed Martin Corporation. All Rights Reserved. 7 ✔
  • 8. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved. 8 ✔
  • 9. © 2014 Lockheed Martin Corporation. All Rights Reserved. 9 2 • Retention
  • 10. The Dream – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved. 10 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec
  • 11. Data Size vs Documents/sec The Reality – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved. 11 Documents per Second Data Size File size vs Inserts Size time Documents/sec
  • 12. The Dream – Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 12 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec
  • 13. Mongo Database Disk Is FULL Single Pizza Box Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 13 Trash
  • 14. The Reality – MongoD Capped Collection © 2014 Lockheed Martin Corporation. All Rights Reserved. 14 Documents per Second Data Size File size vs Inserts Size time Documents/sec
  • 15. © 2014 Lockheed Martin Corporation. All Rights Reserved. 15 3 • Access
  • 16. The Dream - Querying the Cloud © 2014 Lockheed Martin Corporation. All Rights Reserved. 16 Query Response 01101100011011010110001101 11000110110101100011010110 01101011011000110110101100 01101011010110001101100011 11000110101101100011011010
  • 17. And now for something less technical © 2014 Lockheed Martin Corporation. All Rights Reserved. 17
  • 18. 172.100.178.247 Information Retrieval 172.100.27.143 172.100.164.66 172.100.255.250 172.100.235.24 172.100.195.178 172.100.7.227 172.100.215.227 172.100.31.0 172.100.81.242 172.100.156.25 172.100.139.53 172.100.235.229 172.100.25.137 172.100.171.91 172.100.71.242 172.100.108.64 172.100.96.73 172.100.126.217 172.100.77.25 172.100.214.219 172.100.102.211 172.100.124.176 172.100.96.81 172.100.131.150 172.100.98.250 172.100.178.247 172.100.138.157 172.100.45.67 172.100.122.239 172.100.138.218 172.100.102.110 172.100.49.93 172.100.245.74 172.100.213.39 172.100.80.14 172.100.41.125 172.100.150.202 172.100.1.184 172.100.149.233 172.100.98.83 172.100.199.75 172.100.244.223 172.100.140.69 172.100.187.27 172.100.209.228 172.100.6.249 172.100.60.48 172.100.138.64 172.100.130.181 172.100.188.177 172.100.142.25 172.100.109.79 172.100.70.58 172.100.65.184 172.100.250.150 172.100.215.195 172.100.137.136 172.100.49.64 172.100.148.19 172.100.244.227 172.100.178.131 172.100.255.199 172.100.65.112 172.100.201.249 172.100.53.21 172.100.235.60 172.100.84.205 172.100.16.194 172.100.216.90 172.100.45.88 172.100.240.174 172.100.248.179 172.100.48.70 172.100.8.200 172.100.45.130 172.100.235.59 172.100.171.231 172.100.29.124 172.100.239.204 172.100.172.241 172.100.158.216 172.100.70.109 172.100.227.117 172.100.144.199 172.100.223.36 172.100.166.60 172.100.48.61 172.100.70.76 172.100.51.152 172.100.157.95 172.100.71.133 172.100.0.25 172.100.167.58 172.100.94.133 172.100.93.92 172.100.192.109 172.100.176.25 172.100.169.236 172.100.164.186 © 2014 Lockheed Martin Corporation. All Rights Reserved. 18 “1.0 second is about the limit for the user’s flow of thought to stay uninterrupted” – Nielson (1993) J. Nielsen, "Response times: the three important limits," 1993
  • 19. Information Retrieval – 10 seconds © 2014 Lockheed Martin Corporation. All Rights Reserved. 19 1968 R. Miller, "Response time in man-computer conversational transaction," “response delays of a standard ten seconds will not permit the kind of thinking continuity essential to sustained problem solving” – R. Miller(1968)
  • 20. Diving Back In © 2014 Lockheed Martin Corporation. All Rights Reserved. 20
  • 21. Random Data Access © 2014 Lockheed Martin Corporation. All Rights Reserved. 21 past recent Documents
  • 22. Python-MongoR (R for Retention) Distributed database expansion to MongoDB designed to optimize scale-out, write intensive document storage © 2014 Lockheed Martin Corporation. All Rights Reserved.
  • 23. Data Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved. 23 past recent Documents
  • 24. MongoR Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved. 24 past recent DB DB DB DB DB DB
  • 25. MongoR Automated Segmenting © 2014 Lockheed Martin Corporation. All Rights Reserved. 25 past recent DB DB DB DB DB DB DB DB DB DB Generator
  • 26. Mongo Disk Is Full Mongo MongoR Retention © 2014 Lockheed Martin Corporation. All Rights Reserved. 26 Trash Mongo Mongo Mongo
  • 27. MongoR Mongo MongoR “Capped Collection” © 2014 Lockheed Martin Corporation. All Rights Reserved. 27 Mongo Mongo Mongo
  • 28. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved. 28 past recent DB DB DB DB Generator Destructor
  • 29. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved. 29 past recent DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB DB Generator
  • 30. The Real © 2014 Lockheed Martin Corporation. All Rights Reserved. 30 Documents per Second Data Size Data Size vs Documents/sec Size time Documents/sec
  • 31. MongoR Production Behavior. © 2014 Lockheed Martin Corporation. All Rights Reserved. 31
  • 32. Best Practices – Bucket Size Bucket size = ¼ RAM size © 2014 Lockheed Martin Corporation. All Rights Reserved. 32 System RAM Mongo Mongo Mongo Mongo
  • 33. Best Practices – Bucket Limit Bucket Limit = 85-90% Capacity © 2014 Lockheed Martin Corporation. All Rights Reserved. 33 System Drive Capacity
  • 34. Python-mongor In Production • MIT Licensed – https://github.com/lmco/python-mongor © 2014 Lockheed Martin Corporation. All Rights Reserved. 34
  • 35. Questions 35 © 2014 Lockheed Martin Corporation. All Rights Reserved.

Hinweis der Redaktion

  1. Scale out a mongod node Senior Cyber Intelligence Analyst with: Lockheed Martin Computer Incident Response Team Network defense for the Lockheed Martin My Background 3 years working in NASA’s Mission Control Center in Houston, TX Mission Focused International Collaboration solving problems 3 years working for Computer Incident Response Team Mission Focused International Collaboration This is a story of lessons learned building a distributed enterprise monitoring framework, specifically the data storage and retrieval subsystem and beating up mongodb until it gave us the performance we wanted. And we got it to work.
  2. What IS: Ability To Extract Information Tap a network Largely Technical Beyond typical NIDS/deep packet inspection Garner Intelligence from that Information Query focused data sets Influence Starts with an Application to apply the intelligence to the information to extract and store metadata. Make actionable decisions (block a malicious email) The whole thing wrapped up is just a tool for a person. At the end of the day, its main job is to Enable Critical Thinking. Do NOT want users simply following a standard process as that hides authority and accountability (the process said so) The Primary focus of this capability is critical thinking enablement. Allow the user to flow uninterrupted thoughts.
  3. Data is everywhere. Bandwidth is limited We don’t have infinite storage, and even if we did, old data doesn’t is better suited as a compressed blob on tape rather than in a queryable data store Which leads to Access. Tools must support the user, and the user both wants and needs a simple way to access the data. I’ll make note that simple does not need necessarily mean easy.
  4. MongoDB scales out Extremely well inside a data center. <click> There’s endless resources available for building a mongodb cluster in a data center One option would be to pull all your information back to the data center <click> However: <next slide>
  5. In in our use case it’s simply unreasonable to pull all information back to the data center. The sheer volume of information would overwhelm the link back to the datacenter. It’s sometimes unreasonable to pull even the METADATA back to the data center Sometimes we have the case where metadata happens to contain more information than the raw data itself. Move your technical influence to the data. For each information source, ship a “pizza box” Store your metadata Locally on that “pizza box” to minimize wan traffic. Make the data available for querying from home base. Don’t levy a requirement that Field Engineers be MongoDB certified DBAs
  6. Continuing the background: Each pizza box looks at information, applies intelligence from another field, and takes appropriate actions. This could be active such as blocking certain network traffic This could be passive such as alerting operators of fishy information transiting a gateway Of course, the action could be generate and store the metadata for later analysis. Inside each one of these pizza boxes is a fully contained system that can process data, drive actions, and most importantly (for this audience) store the schema-less metadata.
  7. Let’s return to the single pizza box WHAT volume of information can a single pizza box support? More importantly, how many boxes do I need to buy to support X amount of data? How fast can I pump the orange circles while still being able to take the minimal “log” action. One way to measure throughput is to count the number of orange circles I can push through the pizza box every second. To be consistent with Mongo terminology, I’ll refer to this as documents per second from here on out. This -number- is highly dependent on an excessive number of configuration details. On purpose, I’ll describe these in relative terms rather than in absolute terms. Obviously, we want to maximize the documents/second the database can support. What is Mongo’s recommendation? The consensus –pause— is that ALL DATA should fit into RAM. --pause-- If you cant fit the data in RAM, you should at least keep your indexes in RAM.
  8. Throw new boxes into the cloud Self Managing: Put each pizza box into the cloud <click> Should be as simple as copy/paste to add new nodes into the cloud <click> The cloud should also allow nodes to come online/offline at will <click> This could be for standard maintenance, or even part of the operational concept. Perhaps there is a node that is intentionally available during sunny hours only <click> I found there was no feasible method to manage this cluster using ‘standard’ mongodb methods. Note “standard” there were tons of interesting ideas floating around the community, this presentation covers one.
  9. In a dream world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. Because I enjoy realistic dreams, I’ll acknowledge that the throughput can’t be infinite. As the total data size goes up <click> As time goes on –point to x-axis— It would be great if the data size --point to y-axis– grew nice and linearly. It would be really nice if the documents per second would stay constant throughout <click> --point to other y axis-- It should be able to maintain a high insert rate regardless of disk size. As I’m sure you expect, the reality isn’t as promising.
  10. Again, read this as a behavior, not a benchmark This curve is a general behavior Think about it! This really isn't that bad, once you get past the initial dropoff, it becomes “roughly” linear. It isn’t exactly what we wanted, but we might be able to work with this. Of course you can alter the data or indexes and change what is meaningful to you, but the general trend will still look like this. Of course you “lift” this up the green line (the document throughput) by adding memory sufficient to fit all data and indexes into RAM. You can use faster spinning disks, splurge for solid state drives, or even look at the more exotic options such as NAND flash. At the end of the day, you are scaling !UP! In order to accomplish the performance increase. You can scale out by making each pizza box a itself a shard cluster with a hand-ful of MongoDs, a config server, and a MongoS, but mongodb won’t let you stack a MongoS on top of another MongoS. You’d have to write your “global” mongos from scratch anyway. MongoDB scales well inside the data center, but not so well in the field. The big deal here is the ever increasing disk size required. We need a simple way to manage actual disk utilization.
  11. Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. As the total data size goes up, and the capped collection kicks in: <click> Remember, this is real utilized disk space and we are running the mongod on real, physical hardware. You must be absolutely, 100% sure the database size always fits in the allocated disk space. There is VERY little room got “accidently” raise the blue line. That fact makes it very difficult to use the TTL collections as you may not know ahead of time the size of the data and when it is appropriate to expire/remove the data while optimizing data retention periods. We’re engineers… we know things in life arent free, so we know something interesting is bound to happen here <click> The point where mongo transforms from growing with the data to overwriting itself. But we arent sure what yet <click> It would be really nice if the documents per second would stay constant. <click> With a little bit of hackery, we got it. BUT… remember, we just settled for the diminished throughput because we couldn’t afford to scale up to the point our data would fit in RAM, making it look more like this <click> It should be able to maintain a high insert rate regardless of disk size, and regardless of retention methods. As I’m sure you expect, the reality isn’t as promising. <click>
  12. Remember, the Mongo database is still tied to actual physical disk space. Lets assume we have a constant stream of information. It is happily filling up the database according to the graph on the previous page. <click> Eventually, the database is going to fill up, what should we do. For lack of a better idea, we might as well grab the the oldest data <Click> And throw them in the trash <click> to make room for the next document. <click> This particular process happily plays itself out over and over again and is actually built right into mongo as a “capped collection” Capped collection also and has a cousin, the TTL index on documents that also manages document retention.
  13. Pause - Absorb As mongod enters the phase where it is managing the roll-off of old data, <click> It experiences a huge penalty “going off the cliff” when capped collection kicks in <click> I don’t intend to deep dive into how MongoDB extents are laid out for the documents and “next” pointers as they exist on disk. But at a high level, Remember for each and every new document or batch coming into the database, mongo may be responsible for overwriting one or more documents from the database. More importantly to this cliff, each and every new document or batch of documents containing an indexed field must be built into Mongo’s B-TREE indexing. EVEN more disastrous, each document or batch of document being trashed must be REMOVED from the B-TREE index. How bad is it that you pay a performance penalty… to permanently REMOVE data from your index? Of course you “lift” this up the green line (the document throughput) by scaling up. Add memory to fit all data in RAM. use faster spinning disks, splurge for solid state, or buy exotic NAND flash. One of the great things about mongo is that is scales out well in the data center, but seems to require scaling UP within the node.
  14. Querying the Cloud: Still want this cloud to look and function like a query to MongoS. <click> One query in, <click> one result set out. Again, this cloud should scale-out like MongoS. However, it needs to perform the scale-out without the nodes talking to, or knowing about each other. If a node is unreachable or unresponsive, the system should SIMPLY tell me about it. Let ME, the user decide what to do with that knowledge and whether to accept the data, or query again at a different time. <click> I envision a result set similar to this, a list of nodes that responded, a list of nodes that didn’t, and the documents matching the query. This is an extremely simplified vision, but nevertheless provides the behavior we desire. It would be even better if the cloud could tell me WHY a node didn’t respond. Until ‘recently’, there wasn’t a good way within MongoS to allow queries to complete if one of the target shards is unresponsive. One “slow” node should not hamper the entire cluster. If it is slow/overloaded, tell me, but don’t hold up results from the rest of the cloud. If we can Achieve this system, we successfully architected a scale-out cloud mongodb cluster
  15. Let’s go back to critical thinking Tools MUST Support The Analyst Let’s think about your standard information retrieval that you might expect out of a O-L-T-P data store in a security context. You get an IP address to think about. Dramatic pause… does everyone have it? Dot 2-4-7? Your O-L-T-P gives you back some data <click> Pause a moment Let that sink in: After 1.0 seconds, you are already beginning to forget WHY! You were looking for this specific address. You moved into turning-the-crank rather than critical thinking Extending that for a moment <click>
  16. Pause for reading 10 seconds. 45 plus years ago we solved the riddle. 10 seconds waiting for an answer and our ability to maintain a critical thinking with a problem solving mindset is lost. 10 seconds. What does this tell us?
  17. In the out-of-the-box mongo world, new data is always streaming into one giant database that holds all the data. As time goes on, mongo happily manages your extents, adding disk size as needed. <click> If you use the capped collection, it allocates all your disk space up front, and happily overwrites old data with new data when the time comes. <click> We are already assuming the data size is greater than RAM size. Probably close to disk size. If you can make the assumption that you are –more-- interested in recent data –this one here at the end-- than you are historical context, there isn’t a great reason to query then entire database, <click> just to get this data. Remember there is some psychological value gained by ensuring the database can field queries within 10 seconds, or better yet, 1 second. Rather than send one query to one database, if we can segment the data into buckets
  18. Again, this hinges on the assumption that whatever you search for, you would want recent results –these up here– returned first. Rather than dispatch the query to the entire database which is far outside ram at this point, you can dispatch the query to each segment. <click> In theory, this most recent bucket is “warmed”, so I’ve noted it in RED. You can make the assumption that this bucket is taking a stream of insert operations. So… it’s fair to assume that the host OS allows it to remain in RAM. The other buckets may not be so lucky and reside only in virtual memory.
  19. I hope we agree, bucketing the data makes sense. What's the best way to technically bucket the data inside mongo? <click> Make each bucket its own database! Here comes the challenge. How do we manage databases as time goes on? We need a way to generate new buckets as the precious bucket fills up. <click>
  20. As time goes on, the generator is responsible for creating new empty buckets <click> One easy way to scale-out mongod is to tightly manage the working set and make sure it stays in RAM. Since only one bucket is “warmed” with the constant stream of inserts, that entire database is the current working set. This means that only a single database, the current “warm” da must remain in RAM, others may go in and out with queries/map reduces at the host OS discretion.
  21. Rather than one giant database, we focused on forcing the entire active bucket or database to stay in memory by making the bucket size smaller than RAM size <click> As each bucket fills up, the generator creates a new empty bucket and begins routing inserts to the new bucket. <click> But remember, we are still tied to physical disk space, and that space will fill up eventually. We need to start trashing data <click> Similar to a capped collection, we should throw away the oldest data first, but rather than throw away 1 or 2 documents at a time, lets throw away the whole bucket to clean up space quickly. <click> We have successfully cleaned up disk space to allocate a new bucket. We need a way to automate this process so we don’t manually curate the buckets. <click>
  22. Eventually, the field of buckets will fill, and the system will enter a stage where it needs to delete old data to make room for new data. As each bucket gets close to full, a destructer will come by and delete the the next oldest bucket to make room for new data. It looks something like this: <click>? It does rely on diskIO, as the database is not “clean” until the drop Database command is executed.
  23. Generator creates new buckets which are big enough to hold a “substantial” amount of data. Substantial is an objective measure that will need to take into account the hardware mongo is running on. This bucket must be small enough so that the entire bucket, data and indexes and all, fits into RAM. It should actually be small enough so that in addition to completely fitting in ram, at least one other bucket can be paged into RAM as well so concurrent queries aren't fighting for physical resources. It should be big enough such that you get a marked improvement in your “capped” collection. Remember, the “floor” of this concept is almost as inefficient as the capped collection. If you make make each bucket big enough to fit one document, the only thing you are really only trading the B-TREE index manipulation for destructor disk IO
  24. Working set Only one database is “heated” with inserts Only one database must be in RAM, others may go in and out with queries/map reduces.
  25. Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size. As the total data size goes up, and the capped collection kicks in: <click> Remember, this is real disk backing this, so we must be absolutely 100% sure the database size always fits in the allocated disk space. We don’t have room to “accidently” raise the blue line. Same from before. It would be really nice if the documents per second would stay constant. <click> Remember, we just settled for the diminished throughput because we couldn’t afford to scale up the node to the point our data would fit in RAM, making it look more like this <click> Then we agreed that we really like the capped collection behavior, which made our graph look more like this. <click> Apply a little bit of customization and we help mongo keep the most recent, dare I say the most important data in RAM. We helped Mongo expire data, not on a per document level where it was forced to manage its giant B-TREE index, but on a “bucket” level, allowing Mongo to throw its data to the OS for removal.
  26. Its noisy, but in general it maintains performance regardless of data size and whether it is working to prune data. With only a little bit of effort, we scaled-out a single mongod, without scaling-up any of the physical resources.
  27. The system hums along happily when each bucket is sized somewhere between one-quarter to one-third RAM. This is MongoR implementation specific! Follow standard Mongo practices like fstab parameters and numa control first. Assumes the client application itself isn’t a HEAVY user of RAM. Our implementation has the client application poll MongoR, checking if there is a “new” database handle. If there are many client applications on each box, their polling may not be aligned and whenever that rotate happens, there may be 2 ‘active’ buckets for a period of minutes or hours or depending on how often the client polls for a new database. There needs to be headroom for queries / aggregation / map reduce for the historical data to be pulled up into RAM without kicking the “warm” data out of RAM.
  28. Remember the Saw tooth pattern We set the rotation to occur overnight to have minimal impact around the possibility of 2 active buckets. MongoDB allocates 2GB buckets at a time, so if the system absorbs more than 2GB between rotation checks, it could go “over” the allocated space. We found no limit to the number of buckets. We run about 100 30GB buckets per pizza-box.
  29. What do we want? I don’t think MongoR is the end-all solution to this problem. When I built this system, I got the feeling that a lot of people had this problem, but everyone dealt with it separately. We formed a great relationship with our MongoDB contact who gave us enough hints that we should be concerned with our working set. This behavior is valuable to us, I hope the behavior is valuable to others. The best case scenario is that we convince MongoDB to build a behavior set like this directly into MongoDB so I can abandon my implementation.