SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Gossip protocol and applications
Tu Nguyen
Staff Software Engineer - Axon
Gossip protocol
Gossip in computer science
A peer-to-peer communication protocol●
Inspired by epidemics, human gossip and social networks (spreading rumors)●
epidemic protocol (synonym)■
why ?■
rumors or epidemics in society travel at a great speed and reach to almost every member of the community
without needing a central coordinator.
●
Gossip was founded originally to solve Multicast problem●
Multicast●
we want to communicate a message to all the nodes in the network■
each node sends the message to only a few of the nodes■
Multicast problems ?●
Fault-tolerance: node might crash, packet might be dropped, etc○
Scalability: millions, hundreds of millions of nodes○
Centralized: single sender “multi-cast” TCP/UDP packets to others.○
Tree-based multicast: too much redundancy with ACK/NACK msg.○
Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
Gossip basic
A node wants to share some information to the other nodes in the network. Then periodically it
selects randomly a node from the set of nodes and exchanges the information. The node that
receives the information does exactly the same thing.
Cycle●
number of rounds to spread the information■
Fanout●
number of nodes that a node “gossip” within each cycle■
Gossip properties
Node selection must be random (or guarantee enough peer diversity)●
Node only stores local information. There is no shared global state.●
Communication is round-based (periodic).●
Transmission and processing capacity per round is limited.●
All nodes run the same protocol.●
Not deterministic (because of randomness peer sampling).●
Advantages of Gossip
Scalable●
Fault-tolerance●
Robust●
Decentralized●
Convergent consistency●
Gossip modeling
Consider a distributed network where nodes are message-passing to each
other.
State of a node●
Susceptible - node has not received update yet (is not infected).■
Infected - node with an update it is willing to share.■
Removed - node has received the update but is not willing to share.■
Two basic models●
SI (anti-entropy)■
SIR (rumor-mongering)■
When R state happens ?
👉 Many algorithms. One of them are counting for redundant messages.
Gossip modeling
Push / Pull / Push-Pull●
Push■
I nodes are the ones sending/infecting S nodes●
efficient when there are a few updates.●
Pull■
all nodes are actively pulling for updates●
efficient when there are many updates.●
Push-Pull■
node pushes when it has updates and also pulls for new updates●
node and selected node are exchanging information ●
Gossip modeling
https://flopezluis.github.io/gossip-simulator/
Gossip Applications
Applications
Cluster membership●
Information dissemination●
Failure detection●
Database replication●
Overlay network●
Aggregations●
Cluster Membership
 Who are my live peers ?
Desired properties
Connectedness●
Balance●
Short path-length●
Reducing redundancy●
Scalability●
Accuracy●
Full Partial
Full Partial
👍 Connectedness
👍 Short-path length
👌 Accuracy
👌 Balance
👎 High redundancy
👎 Low scalability
👌 Connectedness
👌 Short-path length
👌 Accuracy
👌 Balance
👍 Low redundancy
👍 High scalability
Cluster Membership
✅
SWIM - Cornell University 2002●
SCAMP - Microsoft Research 2003●
CYCLON - Vrije University, The Netherlands, 2005●
HYPARVIEW - University of Lisbon, 2007●
Cluster Membership
SWIM - Cornell university (2002)
Scalable Weakly-consistent Infection-style Process Group
Membership
https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
Properties
Scalable●
Weakly consistent●
Infection-style●
Membership protocol●
SWIM
Motivated by traditional heart-beating●
every interval T, notify peers of liveness■
if no update received from peer P after T * limit, mark P as dead.■
heart-beat = membership + failure detection■
Heart-beat is doing good at:●
completeness - yes!■
strong completeness - every crashed node is eventually detected by all correct
nodes.
●
Accuracy - high!■
Heart-beat problems ?●
Network load: N^2■
SWIM is trying to ...
Separate two problems and solve them one-by-one●
Failure detection (👉 “live” peers)○
Membership protocol (👉 list of peers)○
Optimization●
Reduce network load○
Failure detection○
decrease processing time●
increase accuracy●
Failure Detection properties
One step back...●
The two properties of a distributed system□
Safety - nothing bad ever happens○
Liveness - something good eventually happens.○
Failure Detection properties●
Completeness (L) - failure detector would find the node(s) that finally crashed in the
system. 
□
Accuracy (S) - correct decisions that the failure detector has made in a node.□
Failure Detection properties
Degree of completeness●
depends on number of crashed nodes is suspected by a failure detector in a certain
period
□
Strong completeness - every faulty node is eventually permanently suspected by every non-
faulty node
○
Weak completeness - every faulty node is eventually permanently suspected by some non-faulty
node
○
Degree of accuracy●
depends on number of mistakes that a failure detector made in certain period□
Strong accuracy - no node is suspected (by any node) before it crashes○
Weak accuracy - some non-faulty node is never suspected○
Eventual strong accuracy - after some time, system becomes strong accuracy.○
Eventual weak accuracy - after some time, system becomes weak accuracy.○
SWIM Failure Detection
Each node in set of N node●
Choose a random peer○
Ping - ACK□
Indirect Ping (iff no ACK)○
Choose k random peers□
indirect Ping○
Evaluation:
completeness: every nodes will be pinged!●
accuracy: “high” (🔍)●
speed of detection: 1 * Interval●
network load: (4*k + 2) * N ~ 0(N)●
SWIM Membership Protocol
Aware of join / leave nodes●
Motivated by Gossip●
Piggy-back approach■
Infection-style○
ping is sent to random peer□
eventually (weakly) consistent□
updates send peer-to-peer□
SWIM - Optimization
Suspicion state - to improve accuracy
Trade-off between failure detection time and false positives.●
Introduce suspicion state.●
A 👉 B: Ping! Suspect C failed■
B 👉 A: ACK!■
A few moment later■
A, B 👉 C: Ping! Are you dead ?□
C 👉 A,B: ACK! (i’m not 😋)□
State FSM
SWIM - Optimization
Round-robin probe peer selection
Randomly sort peer set■
Ping in round-robin order■
Evaluation:
Completeness: increase, time-bounded○
State FSM
SWIM - Limitations
Node leave vs fail●
Re-joining●
Event ordering●
Message encryption●
Peer metadata●
Custom payload●
Network participants●
More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
SWIM - Implementation
memberlist https://github.com/hashicorp/memberlist●
serf, consul, etcd are relying on swim-based memberlist for failure detection and group
membership.
●
Other “announced” applications
Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ●
AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html●
Slicing structured overlay network
T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip-
Based_Overlay_Topology_Management
●
https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQAraf Karsh Hamid
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on KubernetesBruno Borges
 
Gossip-based algorithms
Gossip-based algorithmsGossip-based algorithms
Gossip-based algorithmsAmir Payberah
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in MicroserviceNghia Minh
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Introduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdIntroduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdKohei Tokunaga
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
What is gRPC introduction gRPC Explained
What is gRPC introduction gRPC ExplainedWhat is gRPC introduction gRPC Explained
What is gRPC introduction gRPC Explainedjeetendra mandal
 

Was ist angesagt? (20)

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
gRPC - RPC rebirth?
gRPC - RPC rebirth?gRPC - RPC rebirth?
gRPC - RPC rebirth?
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes
 
Gossip-based algorithms
Gossip-based algorithmsGossip-based algorithms
Gossip-based algorithms
 
Jenkins-CI
Jenkins-CIJenkins-CI
Jenkins-CI
 
Distributed Transaction in Microservice
Distributed Transaction in MicroserviceDistributed Transaction in Microservice
Distributed Transaction in Microservice
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Introduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdIntroduction and Deep Dive Into Containerd
Introduction and Deep Dive Into Containerd
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
What is gRPC introduction gRPC Explained
What is gRPC introduction gRPC ExplainedWhat is gRPC introduction gRPC Explained
What is gRPC introduction gRPC Explained
 

Ähnlich wie Grokking Techtalk #39: Gossip protocol and applications

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconIoannis Psaras
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfStevenJoeBiago
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problemWithTheBest
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCPAke Hedman
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesDefCamp
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesPriyanka Aash
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Ontico
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceScyllaDB
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)NYversity
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busbyDavid Busby, CISSP
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world dataAthira Mukundan
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed systemmilkesa13
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)NYversity
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For ManagersDaniel Owens
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)Mohammad Awais Javaid
 

Ähnlich wie Grokking Techtalk #39: Gossip protocol and applications (20)

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness Beacon
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problem
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCP
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for Performance
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busby
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed system
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For Managers
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)
 

Mehr von Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking VN
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking VN
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking VN
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking VN
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking VN
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di độngGrokking VN
 

Mehr von Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platform
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer Vision
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
 

Kürzlich hochgeladen

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Kürzlich hochgeladen (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

Grokking Techtalk #39: Gossip protocol and applications

  • 1. Gossip protocol and applications Tu Nguyen Staff Software Engineer - Axon
  • 3.
  • 4. Gossip in computer science A peer-to-peer communication protocol● Inspired by epidemics, human gossip and social networks (spreading rumors)● epidemic protocol (synonym)■ why ?■ rumors or epidemics in society travel at a great speed and reach to almost every member of the community without needing a central coordinator. ● Gossip was founded originally to solve Multicast problem● Multicast● we want to communicate a message to all the nodes in the network■ each node sends the message to only a few of the nodes■ Multicast problems ?● Fault-tolerance: node might crash, packet might be dropped, etc○ Scalability: millions, hundreds of millions of nodes○ Centralized: single sender “multi-cast” TCP/UDP packets to others.○ Tree-based multicast: too much redundancy with ACK/NACK msg.○ Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
  • 5. Gossip basic A node wants to share some information to the other nodes in the network. Then periodically it selects randomly a node from the set of nodes and exchanges the information. The node that receives the information does exactly the same thing. Cycle● number of rounds to spread the information■ Fanout● number of nodes that a node “gossip” within each cycle■
  • 6. Gossip properties Node selection must be random (or guarantee enough peer diversity)● Node only stores local information. There is no shared global state.● Communication is round-based (periodic).● Transmission and processing capacity per round is limited.● All nodes run the same protocol.● Not deterministic (because of randomness peer sampling).●
  • 8. Gossip modeling Consider a distributed network where nodes are message-passing to each other. State of a node● Susceptible - node has not received update yet (is not infected).■ Infected - node with an update it is willing to share.■ Removed - node has received the update but is not willing to share.■ Two basic models● SI (anti-entropy)■ SIR (rumor-mongering)■ When R state happens ? 👉 Many algorithms. One of them are counting for redundant messages.
  • 9. Gossip modeling Push / Pull / Push-Pull● Push■ I nodes are the ones sending/infecting S nodes● efficient when there are a few updates.● Pull■ all nodes are actively pulling for updates● efficient when there are many updates.● Push-Pull■ node pushes when it has updates and also pulls for new updates● node and selected node are exchanging information ●
  • 13. Applications Cluster membership● Information dissemination● Failure detection● Database replication● Overlay network● Aggregations●
  • 14. Cluster Membership  Who are my live peers ? Desired properties Connectedness● Balance● Short path-length● Reducing redundancy● Scalability● Accuracy● Full Partial
  • 15. Full Partial 👍 Connectedness 👍 Short-path length 👌 Accuracy 👌 Balance 👎 High redundancy 👎 Low scalability 👌 Connectedness 👌 Short-path length 👌 Accuracy 👌 Balance 👍 Low redundancy 👍 High scalability Cluster Membership ✅
  • 16. SWIM - Cornell University 2002● SCAMP - Microsoft Research 2003● CYCLON - Vrije University, The Netherlands, 2005● HYPARVIEW - University of Lisbon, 2007● Cluster Membership
  • 17. SWIM - Cornell university (2002) Scalable Weakly-consistent Infection-style Process Group Membership https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf Properties Scalable● Weakly consistent● Infection-style● Membership protocol●
  • 18. SWIM Motivated by traditional heart-beating● every interval T, notify peers of liveness■ if no update received from peer P after T * limit, mark P as dead.■ heart-beat = membership + failure detection■ Heart-beat is doing good at:● completeness - yes!■ strong completeness - every crashed node is eventually detected by all correct nodes. ● Accuracy - high!■ Heart-beat problems ?● Network load: N^2■
  • 19. SWIM is trying to ... Separate two problems and solve them one-by-one● Failure detection (👉 “live” peers)○ Membership protocol (👉 list of peers)○ Optimization● Reduce network load○ Failure detection○ decrease processing time● increase accuracy●
  • 20. Failure Detection properties One step back...● The two properties of a distributed system□ Safety - nothing bad ever happens○ Liveness - something good eventually happens.○ Failure Detection properties● Completeness (L) - failure detector would find the node(s) that finally crashed in the system.  □ Accuracy (S) - correct decisions that the failure detector has made in a node.□
  • 21. Failure Detection properties Degree of completeness● depends on number of crashed nodes is suspected by a failure detector in a certain period □ Strong completeness - every faulty node is eventually permanently suspected by every non- faulty node ○ Weak completeness - every faulty node is eventually permanently suspected by some non-faulty node ○ Degree of accuracy● depends on number of mistakes that a failure detector made in certain period□ Strong accuracy - no node is suspected (by any node) before it crashes○ Weak accuracy - some non-faulty node is never suspected○ Eventual strong accuracy - after some time, system becomes strong accuracy.○ Eventual weak accuracy - after some time, system becomes weak accuracy.○
  • 22. SWIM Failure Detection Each node in set of N node● Choose a random peer○ Ping - ACK□ Indirect Ping (iff no ACK)○ Choose k random peers□ indirect Ping○ Evaluation: completeness: every nodes will be pinged!● accuracy: “high” (🔍)● speed of detection: 1 * Interval● network load: (4*k + 2) * N ~ 0(N)●
  • 23. SWIM Membership Protocol Aware of join / leave nodes● Motivated by Gossip● Piggy-back approach■ Infection-style○ ping is sent to random peer□ eventually (weakly) consistent□ updates send peer-to-peer□
  • 24. SWIM - Optimization Suspicion state - to improve accuracy Trade-off between failure detection time and false positives.● Introduce suspicion state.● A 👉 B: Ping! Suspect C failed■ B 👉 A: ACK!■ A few moment later■ A, B 👉 C: Ping! Are you dead ?□ C 👉 A,B: ACK! (i’m not 😋)□ State FSM
  • 25. SWIM - Optimization Round-robin probe peer selection Randomly sort peer set■ Ping in round-robin order■ Evaluation: Completeness: increase, time-bounded○ State FSM
  • 26. SWIM - Limitations Node leave vs fail● Re-joining● Event ordering● Message encryption● Peer metadata● Custom payload● Network participants● More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
  • 27. SWIM - Implementation memberlist https://github.com/hashicorp/memberlist● serf, consul, etcd are relying on swim-based memberlist for failure detection and group membership. ●
  • 28. Other “announced” applications Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ● AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html● Slicing structured overlay network T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip- Based_Overlay_Topology_Management ● https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●