SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Ho Nguyen
• Senior Software Engineer
• Technical Interests:
• Solution & code design
• Distributed systems
• Video/Image encoding
• Hobbies
• Movies & music
• Manga & anime (One Piece, Dragon Ball...)
• Coffee lover
Data-intensive problem
Ho Nguyen
Senior Software Engineer
Outline
• Simple problem
• When the data is big
• More problems
• Approaches
Simple problem
Program diagram
Complete code
Face Detection
When the data is big
How big is the data?
• A data set of 2 billion records of
unique URLs
• Assuming the previous program
needs 2 seconds to complete =>
Concurrency number = 0.5 URL/s
2 ∗ 2 ∗ 10𝑒8
3600 ∗ 24
= 46296 𝑑𝑎𝑦𝑠 ≈ 127(𝑦𝑒𝑎𝑟𝑠)
What is the concurrency number we need to
complete the dataset in X days?
What is the concurrency number we need?
• Goal: X=7 Days
• 2 billions URLs
• Current concurrency 0.5 URL/s.
2 ∗ 10𝑒8
X ∗ 3600 ∗ 24
=
2 ∗ 10𝑒8
7 ∗ 3600 ∗ 24
≈ 3307 𝑈𝑅𝐿𝑠/𝑠
How to increase concurrency?
• Optimize code performance
• Increase hardware resource (CPU,
RAM, Disk, Network…) aka Scale-
up
• Scale-out
• Cloning to multiple processes
(X-Axis)
• Splitting by functions (Y-Axis)
• Data partitioning (Z-Axis)
Optimize code
• Pros
• Most effective if we found a bottleneck that can increase performance
to 661,300%
• Save infrastructure cost
• Cons
• Time consuming and uncertain
Scale-up
• Pros
• Easy to apply
• Cons
• Take time to find out the suitable
hardware configuration
• Expensive and limited
• Still need to optimize code and
redesign to take advantage of
hardware resources when cannot scale-up
Scale-out by cloning (X-Axis)
• Pros
• Can use all hardware
resources
• Not limited by hardware
• Cons
• More complex than scale-up
• Concurrency problems
Node 1 Node 2 Node 3
Scale-out by Splitting (Y-Axis)
Review the workflow
Scale-out by Splitting (Y-Axis)
• Download and resize image using CPU
• Face detection on GPU is faster
Reference: https://sites.google.com/site/facedetectionongpu/
Scale-out by Splitting (Y-Axis)
X-axis: Cloning
Download and
Process Image
Download and
Process Image
Download and
Process Image
Face Detection Face Detection
Y-axis:Splitting
Scale-out by Splitting (Y-Axis)
• Pro
• Reuse the advantage of hardware
• Cons
• Complex
• Concurrency problems
Scale-out by data-partitioning (Z-Axis)
Data schema
ID URL Done
1 https://abc.com/image1.jpg 1
2 https://abc.com/image2.jpg 0
3 https://abc.com/image3.jpg 0
4 https://abc.com/image4.jpg 0
Scale-out by data-partitioning (Z-Axis)
ID URL Done
1 https://abc.com/image1.jpg 0
3 https://abc.com/image2.jpg 0
ID URL Done
2 https://abc.com/image2.jpg 0
4 https://abc.com/image4.jpg 0
Key hashing
Scale-out by data-partitioning (Z-Axis)
ID URL Done
1 https://abc.com/image1.jpg 0
2 https://abc.com/image2.jpg 0
ID URL Done
3 https://abc.com/image2.jpg 0
4 https://abc.com/image4.jpg 0
Range base
Scale-out by data-partitioning (Z-Axis)
• Pros
• Increase database performance
• Reduce locking/non-locking
• Cons
• Increase maintenance and infrastructure cost
• Hard for automation scaling
Summary
• Skip the code optimization approach
• Skip the scale-up approach
• Focus on scale-out approaches
• We can increase the number of
processes/machines to increase the
concurrency number
• We can split into 2 services: Downloader and
Face Detections
• We may need data partition to optimize
database performance
Current approach
High Concurrency
Problems
Race condition
• Cause
• Same URL process twice or
more
• Impact
• Waste of resources
• Data corruption
• Faking concurrency
Race condition: How to solve?
• Distributed locks
• Pros
• N/a
• Cons
• Pessimistic locking impact
performance
• Hard to apply because we need to
synchronize multiples nodes
• Not good fault-tolerance
• Data sharding
• Pro
• High performance because of share
load (Physical shard)
• Cons
• Hard for scaling
• Increase maintenance & infrastructure
cost
• Queue/Worker
• Pros
• Easy to implement
• Easy to scale
• Good fault-tolerance
• Reusable communnication
• Con
• The load concentrates on the
queue so it can become a
bottleneck
Race condition: root cause
Race condition only causes
between Downloaders
=> If we found a way to
distribute the unique URL for
each downloader it will solve
the race condition for the whole
system.
Fault Tolerance
• Faults
• Network fault
• Network interruption
• IP Blocking
• Service crash
• Problems
• Can data be lost?
• Can the service restart and
continue to work on remaining
tasks?
Fault Tolerance criteria
Given When Then
A service crashed It restarted No Rework (Continue on
remaining items only)
Downloader service is running It crashed All downloaded images should
not be lost
FaceDetector service is running It crashed All detected result should not be
lost
Downloader is downloading
image
Network error happens Retry
Downloader retry to download
an image again
Network error is IP Locking Should rotate proxy to change
the ip
Service communication
• How do the services communicate?
• Do we need a load balancer?
Service communication methods
Type Method Pros Cons
Synchronous
HTTP • Familiar and
Simple to use
• Need a load
balancer
• Tight coupling
• Lock thread wait for
response
RPC • High performance
than HTTP
Asynchronous
Queue Messaging
(One-One)
• High performance
• Failure isolation
• Act as a load
balancer
• Reduced coupling
• Extra maintenance
cost
• Queue may
become bottleneck
Publich/Subscribe
(One-Many)
• We only need the
one-to-one
comunication
Summary
• Find approach to distribute unique URL to downloaders.
• The approach should pass the fault tolerance criteria
• We can base on the communication methods table to choose
the final solution
High Concurrency
Approaches
Approach 1: Range based physical shard
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑈𝑅𝐿𝑠
𝑚 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑜𝑛𝑠
𝑖 ∈ [0 … 𝑚 − 1]: 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟
𝑘 =
𝑛
𝑚
∶ number of urls in a partition
𝑠𝑡𝑎𝑟𝑡 𝑖 = 𝑘 ∗ 𝑖
𝑒𝑛𝑑 𝑖 =
𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘, 0 ≤ 𝑖 < 𝑚 − 1
𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘 + 𝑛 𝑚𝑜𝑑 𝑚 , 𝑖 = 𝑚 − 1
Approach 1: Range based physical shard
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No rework
+ Need to download
image again if crash
when face detection
+ Partition can be
abadoned
HTTP/gPRC • Pros
• Non locking on db level
• Cons
• Take time for preparation
• Hard to scale out/adjust
• Need load balancer
Approach 2: Logical shard
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠
𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑
𝑘 =
𝑛
𝑚
𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠
𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1
𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1
𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘.
⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
Approach 2: Logical shard
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No rework
+ Need to download image
again if crash when face
detection
+ Partition can be
abadoned
HTTP/gPRC • Pros
• Non locking on db level
• Simple implementation
• Cons
• Hard to scale out/adjust
• High database throughput
• Extra state to maintain: Total
Urls, Current Url Id,…
Approach 3: Queue/Worker x Logical Sharding
𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠
𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑
𝑘 =
𝑛
𝑚
𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠
𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1
𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1
𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘.
⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
Approach 3: Queue/Worker x Logical Sharding
Solve
Race
Condition
Faul tolerance Comunication
Types
Notes
Solved + No Rework
+ Failure isolation
+ Node is replacable
Messaging • Pros
• Easy to scale
• Easy fault-tolerance
• Fail isolation
• Asynchronous
• Cons
• Extra infrastructrure
• High throughput on queue
END
Questions
• How to measure and debug service?
• What is deployment process?
Q&A
THANK YOU FOR YOUR ATTENTION

Weitere ähnliche Inhalte

Was ist angesagt?

Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
 
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019min woog kim
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesLINE Corporation
 
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)Svetlin Nakov
 
Tiki.vn - How we scale as a tech startup
Tiki.vn - How we scale as a tech startupTiki.vn - How we scale as a tech startup
Tiki.vn - How we scale as a tech startupTung Ns
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in MicroserviceNghia Minh
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것흥배 최
 
Introduction to the Disruptor
Introduction to the DisruptorIntroduction to the Disruptor
Introduction to the DisruptorTrisha Gee
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014devCAT Studio, NEXON
 
Updated: Should you be using an Event Driven Architecture
Updated: Should you be using an Event Driven ArchitectureUpdated: Should you be using an Event Driven Architecture
Updated: Should you be using an Event Driven ArchitectureJeppe Cramon
 
ITLC HN 14 - Bizweb Microservices Architecture
ITLC HN 14  - Bizweb Microservices ArchitectureITLC HN 14  - Bizweb Microservices Architecture
ITLC HN 14 - Bizweb Microservices ArchitectureIT Expert Club
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...Grokking VN
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OGeorge Cao
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic websiteTung Nguyen Thanh
 
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다KWON JUNHYEOK
 

Was ist angesagt? (20)

Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
 
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)
Cryptography for Java Developers: Nakov jProfessionals (Jan 2019)
 
Tiki.vn - How we scale as a tech startup
Tiki.vn - How we scale as a tech startupTiki.vn - How we scale as a tech startup
Tiki.vn - How we scale as a tech startup
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in Microservice
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것
multi-thread 어플리케이션에 대해 모든 개발자가 알아 두지 않으면 안 되는 것
 
Sapo Microservices Architecture
Sapo Microservices ArchitectureSapo Microservices Architecture
Sapo Microservices Architecture
 
Introduction to the Disruptor
Introduction to the DisruptorIntroduction to the Disruptor
Introduction to the Disruptor
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014
이승재, 사례로 배우는 디스어셈블리 디버깅, NDC2014
 
Updated: Should you be using an Event Driven Architecture
Updated: Should you be using an Event Driven ArchitectureUpdated: Should you be using an Event Driven Architecture
Updated: Should you be using an Event Driven Architecture
 
ITLC HN 14 - Bizweb Microservices Architecture
ITLC HN 14  - Bizweb Microservices ArchitectureITLC HN 14  - Bizweb Microservices Architecture
ITLC HN 14 - Bizweb Microservices Architecture
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic website
 
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다
[Devil's camp 2019] 혹시 Elixir 아십니까? 정.말.갓.언.어.입.니.다
 

Ähnlich wie Grokking Techtalk #37: Data intensive problem

Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonNeotys
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Offline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo OfflineOffline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo Offlineguestcb5c22
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2Sean Braymen
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and TreatsMarshall Yount
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013Server Density
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 

Ähnlich wie Grokking Techtalk #37: Data intensive problem (20)

Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Offline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo OfflineOffline capable web applications with Google Gears and Dojo Offline
Offline capable web applications with Google Gears and Dojo Offline
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 

Mehr von Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking VN
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking VN
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking VN
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking VN
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking VN
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di độngGrokking VN
 

Mehr von Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platform
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer Vision
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
 

Kürzlich hochgeladen

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Kürzlich hochgeladen (20)

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Grokking Techtalk #37: Data intensive problem

  • 1. Ho Nguyen • Senior Software Engineer • Technical Interests: • Solution & code design • Distributed systems • Video/Image encoding • Hobbies • Movies & music • Manga & anime (One Piece, Dragon Ball...) • Coffee lover
  • 3. Outline • Simple problem • When the data is big • More problems • Approaches
  • 8.
  • 9. When the data is big
  • 10. How big is the data? • A data set of 2 billion records of unique URLs • Assuming the previous program needs 2 seconds to complete => Concurrency number = 0.5 URL/s 2 ∗ 2 ∗ 10𝑒8 3600 ∗ 24 = 46296 𝑑𝑎𝑦𝑠 ≈ 127(𝑦𝑒𝑎𝑟𝑠)
  • 11. What is the concurrency number we need to complete the dataset in X days?
  • 12. What is the concurrency number we need? • Goal: X=7 Days • 2 billions URLs • Current concurrency 0.5 URL/s. 2 ∗ 10𝑒8 X ∗ 3600 ∗ 24 = 2 ∗ 10𝑒8 7 ∗ 3600 ∗ 24 ≈ 3307 𝑈𝑅𝐿𝑠/𝑠
  • 13. How to increase concurrency? • Optimize code performance • Increase hardware resource (CPU, RAM, Disk, Network…) aka Scale- up • Scale-out • Cloning to multiple processes (X-Axis) • Splitting by functions (Y-Axis) • Data partitioning (Z-Axis)
  • 14. Optimize code • Pros • Most effective if we found a bottleneck that can increase performance to 661,300% • Save infrastructure cost • Cons • Time consuming and uncertain
  • 15. Scale-up • Pros • Easy to apply • Cons • Take time to find out the suitable hardware configuration • Expensive and limited • Still need to optimize code and redesign to take advantage of hardware resources when cannot scale-up
  • 16. Scale-out by cloning (X-Axis) • Pros • Can use all hardware resources • Not limited by hardware • Cons • More complex than scale-up • Concurrency problems Node 1 Node 2 Node 3
  • 17. Scale-out by Splitting (Y-Axis) Review the workflow
  • 18. Scale-out by Splitting (Y-Axis) • Download and resize image using CPU • Face detection on GPU is faster Reference: https://sites.google.com/site/facedetectionongpu/
  • 19. Scale-out by Splitting (Y-Axis) X-axis: Cloning Download and Process Image Download and Process Image Download and Process Image Face Detection Face Detection Y-axis:Splitting
  • 20. Scale-out by Splitting (Y-Axis) • Pro • Reuse the advantage of hardware • Cons • Complex • Concurrency problems
  • 21. Scale-out by data-partitioning (Z-Axis) Data schema ID URL Done 1 https://abc.com/image1.jpg 1 2 https://abc.com/image2.jpg 0 3 https://abc.com/image3.jpg 0 4 https://abc.com/image4.jpg 0
  • 22. Scale-out by data-partitioning (Z-Axis) ID URL Done 1 https://abc.com/image1.jpg 0 3 https://abc.com/image2.jpg 0 ID URL Done 2 https://abc.com/image2.jpg 0 4 https://abc.com/image4.jpg 0 Key hashing
  • 23. Scale-out by data-partitioning (Z-Axis) ID URL Done 1 https://abc.com/image1.jpg 0 2 https://abc.com/image2.jpg 0 ID URL Done 3 https://abc.com/image2.jpg 0 4 https://abc.com/image4.jpg 0 Range base
  • 24. Scale-out by data-partitioning (Z-Axis) • Pros • Increase database performance • Reduce locking/non-locking • Cons • Increase maintenance and infrastructure cost • Hard for automation scaling
  • 25. Summary • Skip the code optimization approach • Skip the scale-up approach • Focus on scale-out approaches • We can increase the number of processes/machines to increase the concurrency number • We can split into 2 services: Downloader and Face Detections • We may need data partition to optimize database performance
  • 28. Race condition • Cause • Same URL process twice or more • Impact • Waste of resources • Data corruption • Faking concurrency
  • 29. Race condition: How to solve? • Distributed locks • Pros • N/a • Cons • Pessimistic locking impact performance • Hard to apply because we need to synchronize multiples nodes • Not good fault-tolerance • Data sharding • Pro • High performance because of share load (Physical shard) • Cons • Hard for scaling • Increase maintenance & infrastructure cost • Queue/Worker • Pros • Easy to implement • Easy to scale • Good fault-tolerance • Reusable communnication • Con • The load concentrates on the queue so it can become a bottleneck
  • 30. Race condition: root cause Race condition only causes between Downloaders => If we found a way to distribute the unique URL for each downloader it will solve the race condition for the whole system.
  • 31. Fault Tolerance • Faults • Network fault • Network interruption • IP Blocking • Service crash • Problems • Can data be lost? • Can the service restart and continue to work on remaining tasks?
  • 32. Fault Tolerance criteria Given When Then A service crashed It restarted No Rework (Continue on remaining items only) Downloader service is running It crashed All downloaded images should not be lost FaceDetector service is running It crashed All detected result should not be lost Downloader is downloading image Network error happens Retry Downloader retry to download an image again Network error is IP Locking Should rotate proxy to change the ip
  • 33. Service communication • How do the services communicate? • Do we need a load balancer?
  • 34. Service communication methods Type Method Pros Cons Synchronous HTTP • Familiar and Simple to use • Need a load balancer • Tight coupling • Lock thread wait for response RPC • High performance than HTTP Asynchronous Queue Messaging (One-One) • High performance • Failure isolation • Act as a load balancer • Reduced coupling • Extra maintenance cost • Queue may become bottleneck Publich/Subscribe (One-Many) • We only need the one-to-one comunication
  • 35. Summary • Find approach to distribute unique URL to downloaders. • The approach should pass the fault tolerance criteria • We can base on the communication methods table to choose the final solution
  • 37. Approach 1: Range based physical shard 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑈𝑅𝐿𝑠 𝑚 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑜𝑛𝑠 𝑖 ∈ [0 … 𝑚 − 1]: 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑘 = 𝑛 𝑚 ∶ number of urls in a partition 𝑠𝑡𝑎𝑟𝑡 𝑖 = 𝑘 ∗ 𝑖 𝑒𝑛𝑑 𝑖 = 𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘, 0 ≤ 𝑖 < 𝑚 − 1 𝑠𝑡𝑎𝑟𝑡 𝑖 + 𝑘 + 𝑛 𝑚𝑜𝑑 𝑚 , 𝑖 = 𝑚 − 1
  • 38. Approach 1: Range based physical shard Solve Race Condition Faul tolerance Comunication Types Notes Solved + No rework + Need to download image again if crash when face detection + Partition can be abadoned HTTP/gPRC • Pros • Non locking on db level • Cons • Take time for preparation • Hard to scale out/adjust • Need load balancer
  • 39. Approach 2: Logical shard 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠 𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑 𝑘 = 𝑛 𝑚 𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1 𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1 𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘. ⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
  • 40. Approach 2: Logical shard Solve Race Condition Faul tolerance Comunication Types Notes Solved + No rework + Need to download image again if crash when face detection + Partition can be abadoned HTTP/gPRC • Pros • Non locking on db level • Simple implementation • Cons • Hard to scale out/adjust • High database throughput • Extra state to maintain: Total Urls, Current Url Id,…
  • 41. Approach 3: Queue/Worker x Logical Sharding 𝑛: 𝑡𝑜𝑡𝑎𝑙 𝑢𝑟𝑙𝑠 𝑚: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑑 ∈ 0. . 𝑚 − 1 : 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖𝑑 𝑘 = 𝑛 𝑚 𝑖: 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑢𝑟𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑎 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 ∈ 0. . 𝑘 − 1 𝑖𝑑 < 𝑚 − 1 𝑖 ∈ 0. . 𝑘 − 1 + 𝑛 𝑚𝑜𝑑 𝑚 𝑖𝑑 = 𝑚 − 1 𝑓 𝑖 𝑡ℎ𝑒 𝑖𝑑 𝑜𝑓 𝑢𝑟𝑙 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑝𝑖𝑐𝑘. ⇒ 𝑓 𝑖 = 𝑖𝑑 ∗ 𝑘 + 𝑖
  • 42. Approach 3: Queue/Worker x Logical Sharding Solve Race Condition Faul tolerance Comunication Types Notes Solved + No Rework + Failure isolation + Node is replacable Messaging • Pros • Easy to scale • Easy fault-tolerance • Fail isolation • Asynchronous • Cons • Extra infrastructrure • High throughput on queue
  • 43. END
  • 44. Questions • How to measure and debug service? • What is deployment process?
  • 45. Q&A THANK YOU FOR YOUR ATTENTION

Hinweis der Redaktion

  1. Chào các bạn Chúng ta bắt đầu được chưa ạ? MÌnh xin giới thiệu, mình là Hồ Senior Software Engieer tại cty AXON Ở axon mình “Write Code, Save Lives” Về technical thì mình thích Thiết kế giải pháp và thiết kế code, ngoài ra mình còn thích nghiên cứu về video và image encoding. Ngoài code ra thì mình cũng là một người “Bình Thường” thích nghe nhạc xem phim, đọc manga và xem anime.
  2. Cho mình hỏi là ở đây có bạn nào từng phải suy nghỉ để tối ưu hóa code để chương trình chạy nhanh hơn chưa? Nguyên nhân gì khiến bạn phải optimize code?
  3. Hôm nay, mình muốn chia sẻ một vấn đề hết sức đơn giản trở nên khá là thú vị khi lượng data cần phải xử lí quá lớn mà mình từng gặp. Mục tiêu là giúp các bạn có thêm nhiều góc nhìn trong việc giải quyết các bài toán trong quá trình làm việc ;) Mình tin rằng lựa chọn công nghệ phù hợp sẻ giúp giải pháp của chúng ta tối ưu hơn, nhưng trong phần trình bày này mình sẻ không nghiên về việc lưa trọn công nghệ. Mình không nói rằng các giải pháp của mình đưa ra là giải pháp tốt nhất.
  4. Mình nhận được một yêu cầu là phải viết chương trình như thế này: Nhận vào một URL của một tấm hình, Download tắm hình đó, xử lý và tìm vị trí của các khuôn mặt trên tấm hình đó. => Để dễ hiểu hơn, mời các bạn xem diagram của chương trình.
  5. Tôi tin rằng, các bạn ở đây đều có thể viết được chương trình này. => và đây là code của chương trình
  6. Nhưng mà, sẻ có bạn nói là, phần Face Detections khá là phức tạp nếu không có kiến thức về machine learning. Đúng vậy, nhưng may mắn thay là Face Detection là vấn đề khá là phổ biến và bạn có thể dung thư viện có săn như: OpenCV hay Tensorflow…. => và đây là code của phần Face Detections
  7. Dùng code face detection có sẳn
  8. Và đây là kết quả
  9. Các bạn thầy bài toán ban đầu khá là đơn giản đúng không? Nhưng đó chỉ là bài toán với 1 đường dẫn. Vậy nếu chúng ta có 2 tỷ đường dẫn thì sao? 2 tỷ đường dẫn lớn như thế nào? Mất bao lâu mới có thể xử lý hết? Đó là những câu hỏi mà tôi đã đặc ra khi nhận được yêu cầu từ sếp là: Dùng chương trình ban đầu để xử lý hết 2 tỷ đường dẫn hình trong 1 tập dử liệu có sẳn. => Vậy giờ chúng ta cùng phân tích nha.
  10. 2 tỷ ảnh cần download Nếu chương trình lúc nãy cần 2 giây để hoàn thành Thì tôi cần 127 năm mới xử lý xong tập 2 tỷ ảnh. Nếu tôi quay lại và nói sếp là cần 127 năm mới xử lý xong tập dữ liệu mà sếp đưa. Các bạn nghĩ sẻ như thế nào? T = S/V 1 năm 365.25 ngày
  11. Bài toán là: Tìm số concurrency number chúng ta cần là bao nhiêu để hoàn thành tập 2 tý ảnh trong số ngày mà chúng ta mong muốn?
  12. Vì chúng ta cần phải tăng concurrency number từ 0.5 lên 3307 URL/s tức là tang khoảng 661300% V=S/T
  13. Theo kinh nhiệm của tôi thì chúng ta có 3 cách chính để tăng concurrency number. Có rất nhiều cách để giúp bạn tang concurrency number. Nhưng tổng quảt lại thì có thể có 3 cách Optimize code có thể giúp bạn tang concurrency Tăng phần cứng, ví dụ: Tăng tóc độ xử lý của CPU, Tăng tóc độ đọc ghi của ổ đĩa hoặc tang RAM… Hồi xưa tôi thường xúi khách hang tang IOPs của Database Sever để tăng tốc độ. Scale-out (nhân rộng), có 3 phương thức scale-out Nhân rộng ra nhiều nodes/processes Chia theo chức năng Chia nhóm dữ liệu đễ xử lý. Chúng ta sẻ phân tích từng cách một
  14. Cách tối ưu code có thể giúp chúng ta đạt được kết quả cực kì tốt nếu chúng ta tìm được thuật toán tối ưu hơn nhiều lần. Nhưng chúng ta cần phải bỏ nhiều thời gian và công sức để tìm được chỗ cần optimize
  15. Trong thời đại điện toán đám mây, bạn có thể có 1 con server cực mạnh chỉ cần vài click Nhưng mà nó rất đắt đỏ (AZURE Calculator) https://st.ht/M6rGb Và bạn sẻ đạt đến giới hạn sớm thôi Tôi đã nghĩ cách này thì không có gì thú vi
  16. Scale-out chiều ngang thì như mình đã nói, từ một node có sẳn bạn nhân ra thành nhiều node process dữ liệu cùng lúc. Các này thì bạn sẻ không có bị giới hạn phần cứng Khả năng chịu lỗi cao. Nhưng mà khó triễn khai. Ví dụ như race condition. Đối với bài toán hiện tại thì Race condition có thể xãy ra như thế nào?
  17. Scale-out theo chiều dọc là chia từ 1 node gồm tất cả các chức năng thành nhiều node mỗi node 1 chức năng. Kiến trúc microservices dựa trên cách chia này. Chúng ta cần xem xét lại các chức năng của chương trình để có hiểu được các chứng năng chính và lợi ích đạt được khi chia các chức năng ra. May mắn là chương trình của tôi có các chức năng rất đơn giãn, vì thế tôi nhanh chia chương trình này ra thành 2 phần. Có bạn nào có thể giúp tôi chia ra không?
  18. Bằng cách chia thành 2 nhóm, thì hệ thống của tôi sẻ tận dụng được thế mạnh của phần cứng.
  19. Đây là sơ đồ về các chia các nodes Tới đây thì các bạn cũng thaas là từ một vất đề đơn giản ban đầu chúng ta đang phải gặp các vấn đề phức tạp hơn. Nhưng mà chưa dừng lại ở đây. Có một số vấn đề cần giải quyết.
  20. => Chúng ta đã có hướng tiếp cận cơ bản là scale-out, nhưng còn một số vần đề cần phải quan tâm.
  21. Tôi gọi nó là ”High Concurrency Problems” bới vì mục tiêu là giải quyết các vấn đề cần phải làm để tăng concurrency number.
  22. Nhiều node cùng xử lý 1 URL. Kết quả lưu vào có thể bị xáo chộn. Vậy giải quyết race condition như thế nào? Làm sao để đạt được performance tốt nhất? Đây là vấn đề chúng ta cần phải quan tâm khi tìm giải pháp.
  23. Để chánh việc phân phát các URL chùng lập. Chúng ta có 3 giải pháp. Distributed locks Có thể hiện thực bằng Redis Set/Get NX option Data sharding Là 1 cách thức scale-out Các này cần phải tìm ra các để chia data cho hiệu quả. Tốn chi phí chuẩn bị cũng như cơ sở hạ tân Queue Dễ thực hiện Dễ scale Để tiết kiệm thời gian, mình xin bỏ qua phần Distributed Locks.
  24. Khả năng chịu lỗi là vấn đề tôi luôn nghĩ tới mỗi khi thiết kế hệ thống. Tính đến các trường hợp lỗi có thể xãy ra giúp bạn giãm các rũi ro mà hệ thống của bạn có thể gặp phải. Ví dụ: Mất data Làm lại tư đầu
  25. Làm thế nào để các service giao tiếp nhau? Có cần một load balancer hay không?
  26. Tiếp theo chúng ta sẻ tìm cách giải quyết vấn các vấn đề trên và tìm hướng tiếp cận.
  27. Đê tiếp cận giải phấp tôi sẻ phân tích từng vấn đề và ở phần trước và cách giải quyết chúng. Đầu tiên là Race Condition
  28. Cost for preparation and deployment is high Hard for scaling. Add a new node: We can’t assign a new node to an existing partition. So we need to re-shard data and start again with new partitions number. Remove a node (a node crash): The partition of this node can be abandoned.
  29. Cost for preparation and deployment is high Hard for scaling. Add a new node: We can’t assign a new node to an existing partition. So we need to re-shard data and start again with new partitions number. Remove a node (a node crash): The partition of this node can be abandoned.
  30. he load is concentrated on a single database so the database can become the bottleneck (we can solve this problem by using more hardware resources for database node) Hard for scaling: Remove a node (or node crash): We need to recover the node (or add a new one with the same id) if we don’t want the partition of the node is abandoned Add new node: restart all nodes in the system with new “Number Of Processes” value Need to maintain the number of remaining URLs (or the processed URL)
  31. the load is concentrated on a single database so the database can become the bottleneck (we can solve this problem by using more hardware resources for database node) Hard for scaling: Remove a node (or node crash): We need to recover the node (or add a new one with the same id) if we don’t want the partition of the node is abandoned Add new node: restart all nodes in the system with new “Number Of Processes” value Need to maintain the number of remaining URLs (or the processed URL)
  32. The Delegator/Load Balancer is a service that fetches URLs from URLs database and pushes the URLs into the Queue which will be consumed by Workers (download & process image) Queue: an abstraction, we can build this queue inside the Delegator/Load Balancer or using an open-source project like RabbitMQ/Kafka. Process (1..N): Worker that consumes the URL from Queue for processing. Then push another queue item for the Face Detection phase. With this approach, we have some advantages: We archive Fault-Tolerance and Scalability naturally When a worker fails, its URLs will be handled by other workers When we add a worker, it will consume outstanding URLs in the queue We can reuse the Queue system for Face Detection to optimize the reusable result of “Download & Process image”. This means we can save the image which was downloaded and processed in storage before putting the task item into Face Detection queue. The implementation of single vs multiple goroutines in a node are the same, so we can have flexibility.
  33. The Delegator/Load Balancer is a service that fetches URLs from URLs database and pushes the URLs into the Queue which will be consumed by Workers (download & process image) Queue: an abstraction, we can build this queue inside the Delegator/Load Balancer or using an open-source project like RabbitMQ/Kafka. Process (1..N): Worker that consumes the URL from Queue for processing. Then push another queue item for the Face Detection phase. With this approach, we have some advantages: We archive Fault-Tolerance and Scalability naturally When a worker fails, its URLs will be handled by other workers When we add a worker, it will consume outstanding URLs in the queue We can reuse the Queue system for Face Detection to optimize the reusable result of “Download & Process image”. This means we can save the image which was downloaded and processed in storage before putting the task item into Face Detection queue. The implementation of single vs multiple goroutines in a node are the same, so we can have flexibility.