Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 106

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

3

Share

Download to read offline

Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

  1. 1. #MDBlocal A Complete Methodology of Data Modeling for MongoDB Daniel Coupal Education, MongoDB SOCAL
  2. 2. @ #MDBlocal Daniel Coupal Senior Curriculum Engineer, Education, MongoDB danielcoupal SOCAL
  3. 3. Goals of the Presentation Introduction Document vs Tabular Recognize the differences
  4. 4. Goals of the Presentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB
  5. 5. Goals of the Presentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Use Case Franchise of coffee shops
  6. 6. Goals of the Presentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply them Use Case Franchise of coffee shops
  7. 7. Goals of the Presentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply them Use Case Franchise of coffee shops Conclusion and Questions
  8. 8. Document versus Tabular Recognize the differences when modeling for a Document Database versus a Relational/Tabular Database
  9. 9. #MDBLocal Document Model A. Fields/Attributes B. Arrays C. Sub-documents
  10. 10. #MDBLocal A. Fields/Attributes in the Document Model Explicit column names for defined values
  11. 11. #MDBLocal A. Fields/Attributes in the Document Model { 007, "Daniel", "Ferrari", "GTS", 1982 } Explicit column names for defined values
  12. 12. #MDBLocal A. Fields/Attributes in the Document Model { "_id": 007 "owner": "Daniel", "make": "Ferrari", "model": "GTS", "year": 1982 } Explicit column names for defined values
  13. 13. #MDBLocal B. Arrays in the Document Model Use to represents One-to-Many relationships
  14. 14. #MDBLocal B. Arrays in the Document Model { owner: "Daniel", make: "Ferrari", wheels: [ partNo: 234819, partNo: 281928, partNo: 392838, partNo: 928038 ], ... } Use to represents One-to-Many relationships
  15. 15. #MDBLocal C. Sub-documents in the Document Model Use to represents One-to-One relationships
  16. 16. #MDBLocal C. Sub-documents in the Document Model { owner: "Daniel", make: "Ferrari", power: 660hp, consumption: 10mpg … } Use to represents One-to-One relationships
  17. 17. #MDBLocal C. Sub-documents in the Document Model { owner: "Daniel", make: "Ferrari", engine: { power: 660hp, consumption: 10mpg } … } Use to represents One-to-One relationships
  18. 18. #MDBLocal C. Sub-documents in the Document Model { owner: "Daniel", make: "Ferrari", engine: { power: 660hp, consumption: 10mpg } … } Use to represents One-to-One relationships db.cars.find( {"owner":"Daniel"}, {"engine":1} ) Projection
  19. 19. #MDBLocal Car Stored in a Tabular/Relational Database SELECT * FROM Cars WHERE Cars.owner = "Daniel" INNER JOIN Wheels Cars.id = Wheels.car_id INNER JOIN Seats Cars.id = Seats.car_id INNER JOIN Brakes Cars.id = Brakes.car_id ...
  20. 20. #MDBLocal Car Stored in a Document Database db.cars.find( {"owner":"Daniel"} ) What goes together is stored together
  21. 21. #MDBLocal Example 1: Modeling a blog
  22. 22. #MDBLocal CRDs: A few Collection-Relationship-Diagrams Solutions Solution A Queries by users Simple
  23. 23. #MDBLocal CRDs: A few Collection-Relationship-Diagrams Solutions Solution A Queries by articles Queries by users Duplication of users information Simple Solution B
  24. 24. #MDBLocal CRDs: A few Collection-Relationship-Diagrams Solutions Solution A Solution C Queries by articles Queries by users Duplication of users information Simple Solution B
  25. 25. #MDBLocal Example 2: Modeling a Social Network
  26. 26. #MDBLocal Example 2: Modeling a Social Network Solution A writes reads Images Collection CC: Joanna Penn
  27. 27. #MDBLocal Example 2: Modeling a Social Network Solution B writes reads Submitter Profiles CC: Joanna Penn
  28. 28. #MDBLocal Example 2: Modeling a Social Network Solution C writes reads Follower Profiles
  29. 29. #MDBLocal Example 2: Modeling a Social Network Solution C writes reads ü Slower writes ü More storage space ü Duplication ü Faster reads Pre-aggregated Data Follower Profiles
  30. 30. #MDBLocal Differences: Tabular vs Document Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema
  31. 31. #MDBLocal Differences: Tabular vs Document Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions
  32. 32. #MDBLocal Differences: Tabular vs Document Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes
  33. 33. #MDBLocal Differences: Tabular vs Document Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime
  34. 34. #MDBLocal Differences: Tabular vs Document Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime Performance • mediocre • optimized
  35. 35. Methodology Summarize the steps of a methodology when modeling for MongoDB
  36. 36. #MDBLocal Main Tradeoff in Modeling
  37. 37. #MDBLocal Methodology
  38. 38. Methodology 1. Describe the Workload
  39. 39. Methodology 1. Describe the Workload 2. Identify and Model the Relationships
  40. 40. #MDBLocal Actors, Movies and Reviews actor_name date_of_birth movie_title revenues reviewer_name rating
  41. 41. #MDBLocal Actors, Movies and Reviews actor_name date_of_birth movie_title revenues reviewer rating
  42. 42. #MDBLocal Actors, Movies and Reviews actor_name date_of_birth movie_title revenues reviewer rating
  43. 43. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  44. 44. #MDBLocal Flexible Methodology
  45. 45. Use Case Let's start a franchise of coffee shops…
  46. 46. #MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee
  47. 47. #MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America
  48. 48. #MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expend to the rest of the World
  49. 49. #MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world
  50. 50. #MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world 2. Best Technology
  51. 51. #MDBLocal First Key to Success: Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
  52. 52. #MDBLocal Second Key to Success: Use the Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection
  53. 53. #MDBLocal Key to Success 2: Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection b) Intelligent Shelves • Measure inventory in real time
  54. 54. #MDBLocal Key to Success 2: Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection b) Intelligent Shelves • Measure inventory in real time c) Intelligent Data Storage • MongoDB
  55. 55. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  56. 56. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed
  57. 57. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days
  58. 58. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics
  59. 59. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup
  60. 60. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics
  61. 61. #MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
  62. 62. #MDBLocal 1 – Workload: quantify/qualify the queries Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s
  63. 63. #MDBLocal 1 – Workload: quantify/qualify the queries Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s
  64. 64. #MDBLocal 1 – Workload: details of the most important queries Attribute Value Description Making a cup of coffee at rush hour Type Write Frequency 3 000 000 writes/hr 833 writes/sec Size 100 bytes Consistency/Integrity weak Latency < 10 sec Durability weak Life/Duration 1 year Security None
  65. 65. #MDBLocal Disk Space Cups of coffee • one year of data • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • one year of data • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
  66. 66. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  67. 67. #MDBLocal 2 - Relations are still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
  68. 68. #MDBLocal 2 - Entities for Beyond the Stars Coffee Entities: • Coffee cups • Stores • Coffee machines • Shelves • Weighings • Coffee bags
  69. 69. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  70. 70. Patterns Recognize the need and when to apply Schema Design Patterns
  71. 71. #MDBLocal Schema Design Patterns Resources A. Advanced Schema Design Patterns, Daniel Coupal • MongoDB World 2017 B. Blogs on Patterns, Ken Alger & Daniel Coupal • https://www.mongodb.com/blog/post/building- with-patterns-a-summary C. MongoDB University: M320 – Data Modeling • https://university.mongodb.com/courses/M320/about
  72. 72. #MDBLocal Schema Versioning
  73. 73. #MDBLocal Schema Versioning
  74. 74. #MDBLocal Computed Pattern
  75. 75. #MDBLocal Computed Pattern
  76. 76. #MDBLocal Subset Pattern
  77. 77. #MDBLocal Subset Pattern
  78. 78. #MDBLocal Bucket Pattern
  79. 79. #MDBLocal Bucket Pattern { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour
  80. 80. #MDBLocal Solution with Patterns • Schema Versioning • Computed • Subset • Bucket
  81. 81. #MDBLocal https://university.mongodb.com/courses/M320/about Data Modeling Patterns Use Cases
  82. 82. Conclusion
  83. 83. Takeaways from the Presentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
  84. 84. Takeaways from the Presentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
  85. 85. Takeaways from the Presentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
  86. 86. Thank you for taking our FREE MongoDB classes at university.mongodb.com
  87. 87. Register Now! https://university.mongodb.com/courses/M320/about
  88. 88. #MDBlocal Every session you rate enters you into a drawing for a gift card and TWO passes to MongoDB World 2020! A Complete Methodology of Data Modeling with MongoDB https://www.surveymonkey.com/r/W8N6DLY
  89. 89. Appendix A Schema Versioning Pattern
  90. 90. #MDBLocal Nightmare: Alter Table
  91. 91. #MDBLocal This is what your dreams should be when thinking about a schema upgrade !
  92. 92. #MDBLocal Schema Revision Relational MongoDB Versioned Unit Schema Document Migration Procedure Difficult Easy Service Uptime Interrupted No interruption Rollback Difficult to nightmare-ish Easy
  93. 93. #MDBLocal
  94. 94. #MDBLocal
  95. 95. #MDBLocal Application Lifecycle Modify Application • Can read/process all versions of documents • Have different handler per version • Reshape the document before processing it Update all Application servers • Install updated application • Remove old processes Once migration completed • remove the code to process old versions.
  96. 96. #MDBLocal Document Lifecycle New Documents: • Application writes them in latest version Existing Documents A) Use updates to documents • to transform to latest version • keep forever documents that never need an update B) or transform all documents in batch • no worry even if process takes days
  97. 97. #MDBLocal Timeline of the migration
  98. 98. #MDBLocal Problem Solution Use Cases Examples Benefits and Trade-Offs Schema Versioning Pattern • Avoid downtime while doing schema upgrades • Upgrading all documents can take hours, days or even weeks when dealing with big data • Don't want to update all documents No downtime needed Feel in control of the migration Less future technical debt 🆇 May need 2 indexes for same field while in migration period • Each document gets a "schema_version" field • Application can handle all versions • Choose your strategy to migrate the documents • Every application that use a database, deployed in production and heavily used. • System with a lot of legacy data
  99. 99. Appendix B Computed Pattern
  100. 100. #MDBLocal Mathematical Operations
  101. 101. #MDBLocal Mathematical Operations
  102. 102. #MDBLocal "Fan Out" Operations
  103. 103. #MDBLocal "Roll Up" Operations
  104. 104. #MDBLocal Problem Solution Use Cases Examples Benefits and Trade-Offs Computed Pattern • Costly computation or manipulation of data • Executed frequently on the same data, producing the same result Read queries are faster Saving on resources like CPU and Disk 🆇 May be difficult to identify the need 🆇 Avoid applying or overusing it unless needed • Perform the operation and store the result in the appropriate document and collection • If need to redo the operations, keep the source of them • Internet Of Things (IOT) • Event Sourcing • Time Series Data • Frequent Aggregation Framework queries
  105. 105. THANK YOU

×