SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Flipkart Website Architecture

      Mistakes & Learnings

          Siddhartha Reddy
          Architect, Flipkart
June 2007
November 2007
December 2012
www.flipkart.com
• Started in 2007
• Current Architecture from mid 2010
• Evolution of the architecture presented as…

       Issue[1]             RCA[2]   Actions   Learnings




•   *1+ Issue: Website is “slow”
•   [2] RCA = Root Cause Analysis
Surviving & reacting to the environment

INFANCY (2007 – MID-2010)
Website is “slow”!
RCA
• Why?
  – MySQL queries taking too long
• Why?
  – Too many queries
  – Many slow queries
  – Queries locking tables
• Why?
  – Capacity
• Hmm…
Fixing it
• Get beefier servers (the obvious)
• Separate master_db, slave_db
  – Writes go to master_db
  – Reads from slave_db
  – Critical reads from master_db
                              Writes                 Reads
   Reads           Writes

           MySQL              MySQL                  MySQL
                                       Replication   Slave
                              Master
Learning from it
• Scale-out databases reads by distributing load
  across systems
• Isolate database writes from reads
  – Writes are (usually) more critical
Website is “slow”!
    (Again)
RCA
• Why?
  – MySQL queries taking too long (on slave_db)
• Why?
  – Too many queries
  – Many slow queries
• Why?
  – Queries from analytics / reporting and other
    backend jobs
• Urm…
Fixing it
• Analytics / reporting DB (archival_db)
    – Use MyISAM — optimized for reads
    – Additional indexes for quicker reporting
                                           Website                  Website
                                           Writes                    Reads
Website                 Website
Writes                   Reads

                                           MySQL                    MySQL
                                                      Replication   Slave 1
                                           Master
MySQL                   MySQL
          Replication   Slave
Master                                          Replication

                        Analytics           MySQL                   Analytics
                         Reads              Slave 2                  Reads
Learning from it
• Isolate the databases being used for serving
  website traffic from those being used for
  analytical/reporting
• Isolate systems being used by production
  website from those being used for background
  processing
Learning the basics

BABY (2010 – 2011)
Website is “slow”!
RCA
• Why?
• How?
  – Instrumentation
RCA - 1
• Why?
     – Logging a lot
     – PHP processes blocking on writing logs
               Request2
              -> Process2




                                                                                      Writing
                                          Waiting




                                                                Waiting
Request1                    Request3                Request2              Request2              Request3
-> Process1                 -> Process3             :Process1             :Process2             :Process3

              Log file
RCA - 2
• Why?
  – Service Oriented Architecture (SOA)
  – Too many calls to remote services per request
     • Creating fresh connection for each call
     • All the calls are made in serial order


                     Connect to   Request    Connect    Request      Send
   Receive request
                      Service1    Service1   Service2   Service2   response
RCA - 3
• Why?
  – Configurability
  – Fetch a lot of “config” from database for serving
    each request
     Receive    Fetch     Fetch     Fetch     Fetch      Send
     request   Config1   Config2   Config3   Config4   response
RCA – 1,2,3
• Why?
  – Logging a lot
  – SOA
  – Configurability
• Why?
  – PHP’s process model
• Argh!
Fixing it
• fk-w3-agent
  – Simple Java “middleware” daemon
  – Deployed on each web server
  – PHP communicates to it through local socket
  – Hosts pluggable “handlers”
fk-w3-agent: LoggingHandler

               Request2                                 Request2
              -> Process2                               -> Process2
Request1                    Request3      Request1                     Request3
-> Process1                 -> Process3   -> Process1                 -> Process3


                                                         fk-w3-
              Log file                                    agent

                                                                 Async / buffered




                                                        Log file
fk-w3-agent: ServiceHandler(s)
                  Connect to     Request           Connect         Request       Send
Receive request
                   Service1      Service1          Service2        Service2    response




                                            Call
         Receive request                                             Send response
                                      fk-w3-agent


                                        fk-w3-
                                        agent

                      Service1                                Service2
fk-w3-agent: ConfigHandler
Receive      Fetch     Fetch        Fetch          Fetch      Send
request     Config1   Config2      Config3        Config4   response




                             Database

                       Fetch all config from
    Receive request                                Send response
                           fk-w3-agent

                           fk-w3-
                            agent
                                 Poll and cache



                          Database
Learning from it
• PHP — good for frontend and templating
  – Gives a lot of agility
  – Limiting process model
     • Hurdle for high performance
• Java — stability and performance
• Horses for courses
Website is “slow”!
    (Again)
RCA
• Why?
  – PHP processes taking up too much time
  – PHP processes taking up too much CPU
• Why?
  – Product info deserialization taking up time/CPU
  – View construction taking up time/CPU
Fixing it
• Caching!
• Cache fully constructed pages
  – For a few minutes
  – Only for highly trafficked pages (Homepage)
• Cache PHP serialized Product objects
  – ~20 million objects
  – Memcache
• Yeah! But…
  – Add caching => add complexity
Caching: Complications (1)
• “Caching fully constructed pages”
• But parts of pages still need to be dynamic
     • Example: Logged-in user’s name
• Impossible to do effective bucket testing
     • Or at least makes it prohibitively complex
Caching: Complications (2)
• “Caching PHP serialized Product objects”
• Without caching:
              getProductInfo()            Fetch from CMS

• With caching, cache hit:
              getProductInfo()           Fetch from Cache

• With caching, cache miss:
                         Fetch from   Fetch from
      getProductInfo()                             Set in Cache
                           Cache         CMS
Caching: Complications (3)
• TTL: ∞ (i.e. no invalidation)
• Pro-actively repopulate products in the cache
  – Receive “notifications” about product updates
     • Notification Server — pushes notifications raised by
       CMS
• Use a persistent, distributed cache
  – Memcache => Membase, Couchbase
Learning from it
• Caching is a powerful tool for performance
  optimization
• Caching adds complexities
  – Reduced by keeping cache close to data source
  – Think deeply about TTL, invalidation
• Use caching to go from “acceptable
  performance” to “awesome performance”
  – Don’t rely on it to get to “acceptable
    performance”
Growing up

KID (2012)
Website is “slow”!
RCA
• Why?
  – Search-service is slow (or Reviews-service is slow
    or Recommendations-service is slow)
• But why is rest of website slow?
  – Requests to the slow service are blocking
    processing threads
• Eh?!
Let’s do some math
• Let’s say
   – Mean (or median) response time: 100 ms
   – 8-core server
   – All requests are CPU bound
• Throughput: 80 requests per second (rps)
• Let’s also say
   – 95th Percentile response time: 1000 ms
       • Call them “bad requests”
• 4 bad requests in a second
   – Throughput down to 44 rps
• 8 bad requests in a second?
   – Throughput down to 8 rps
Fixing it
• Aggressive timeouts for all service calls
  – Isolate impact of a slow service
     • only to pages that depend on it
• Very aggressive timeouts for non-critical
  services
  – Example: Recommendations
     • On a Product page, Search results page etc.
     • Not on My Recommendations page
• Load non-critical parts of pages through AJAX
Learning from it
• Isolate the impact of a poorly performing
  services / systems
• Isolate the required from the good-to-have
Website is “slow”!
    (Again)
RCA
• Why?
  – Load average of web servers has spiked
• Why?
  – Requests per second has spiked
     • From 1000 rps to 1500 rps
• Why?
  – Large number of notifications of product
    information updates
Fixing it
• Separate cluster for receiving product info
  update notifications from the cluster that
  serves users
• Admission control: Don’t let a system receive
  more requests than it can handle
  – Throttling
• Batch the notifications
Learning from it
• Isolate the systems serving internal requests
  from those serving production traffic
• Admission control to ensure that a system is
  isolated from the over-enthusiasm of a client
• Look at the granularity at which we’re working
Increasing complexity

TEENAGER
THANK YOU
Mistake?
• Sub-optimal decision
  – Not all information/scenarios considered
  – Insufficient information
  – Built for a different scenario
• Due to focus on “functional” aspects
• A mistake is a mistake
  – … in retrospect

Weitere ähnliche Inhalte

Was ist angesagt?

需求分析及相关技术
需求分析及相关技术需求分析及相关技术
需求分析及相关技术Weijun Zhong
 
Scala lens: An introduction
Scala lens: An introductionScala lens: An introduction
Scala lens: An introductionKnoldus Inc.
 
JUnit & Mockito, first steps
JUnit & Mockito, first stepsJUnit & Mockito, first steps
JUnit & Mockito, first stepsRenato Primavera
 
REST API Development with Spring
REST API Development with SpringREST API Development with Spring
REST API Development with SpringKeesun Baik
 
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration수홍 이
 
A New CLI for Spring Developer Productivity
A New CLI for Spring Developer ProductivityA New CLI for Spring Developer Productivity
A New CLI for Spring Developer ProductivityVMware Tanzu
 
Launch safely with Feature Flags
Launch safely with Feature FlagsLaunch safely with Feature Flags
Launch safely with Feature FlagsWise Engineering
 
Managing an OSGi Framework with Apache Felix Web Console
Managing an OSGi Framework with  Apache Felix Web ConsoleManaging an OSGi Framework with  Apache Felix Web Console
Managing an OSGi Framework with Apache Felix Web ConsoleFelix Meschberger
 
H/W 규모산정기준
H/W 규모산정기준H/W 규모산정기준
H/W 규모산정기준sam Cyberspace
 
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)Júlio de Lima
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVMJeroen Reijn
 
Format string Attack
Format string AttackFormat string Attack
Format string Attackicchy
 
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~Yuichi Hasegawa
 
Projects In Laravel : Learn Laravel Building 10 Projects
Projects In Laravel : Learn Laravel Building 10 ProjectsProjects In Laravel : Learn Laravel Building 10 Projects
Projects In Laravel : Learn Laravel Building 10 ProjectsSam Dias
 

Was ist angesagt? (20)

Diving into Java Class Loader
Diving into Java Class LoaderDiving into Java Class Loader
Diving into Java Class Loader
 
需求分析及相关技术
需求分析及相关技术需求分析及相关技术
需求分析及相关技术
 
Scala lens: An introduction
Scala lens: An introductionScala lens: An introduction
Scala lens: An introduction
 
JUnit & Mockito, first steps
JUnit & Mockito, first stepsJUnit & Mockito, first steps
JUnit & Mockito, first steps
 
REST API Development with Spring
REST API Development with SpringREST API Development with Spring
REST API Development with Spring
 
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration
스프링캠프 2016 발표 - Deep dive into spring boot autoconfiguration
 
Log4j slideshare
Log4j slideshareLog4j slideshare
Log4j slideshare
 
A New CLI for Spring Developer Productivity
A New CLI for Spring Developer ProductivityA New CLI for Spring Developer Productivity
A New CLI for Spring Developer Productivity
 
Launch safely with Feature Flags
Launch safely with Feature FlagsLaunch safely with Feature Flags
Launch safely with Feature Flags
 
Managing an OSGi Framework with Apache Felix Web Console
Managing an OSGi Framework with  Apache Felix Web ConsoleManaging an OSGi Framework with  Apache Felix Web Console
Managing an OSGi Framework with Apache Felix Web Console
 
H/W 규모산정기준
H/W 규모산정기준H/W 규모산정기준
H/W 규모산정기준
 
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)
[TDC2016] Ruby in Tests: Automatizando testes de Unidade, API e GUI (Web)
 
Log4j in 8 slides
Log4j in 8 slidesLog4j in 8 slides
Log4j in 8 slides
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVM
 
Spring aop
Spring aopSpring aop
Spring aop
 
Format string Attack
Format string AttackFormat string Attack
Format string Attack
 
Alfresco紹介
Alfresco紹介Alfresco紹介
Alfresco紹介
 
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
 
Maven基礎
Maven基礎Maven基礎
Maven基礎
 
Projects In Laravel : Learn Laravel Building 10 Projects
Projects In Laravel : Learn Laravel Building 10 ProjectsProjects In Laravel : Learn Laravel Building 10 Projects
Projects In Laravel : Learn Laravel Building 10 Projects
 

Andere mochten auch

Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web appsDirecti Group
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...slashn
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White BreadGaurav Lochan
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkartPankaj Kaushal
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web Appscothis
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjeeslashn
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...slashn
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...slashn
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...slashn
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketingslashn
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBbackslash451
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singhslashn
 
Soa design pattern
Soa design patternSoa design pattern
Soa design patternLap Doan
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMMilan49
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMtigerjayadev
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...Robert Mederer
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 

Andere mochten auch (20)

How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White Bread
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkart
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Flipkart
FlipkartFlipkart
Flipkart
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web App
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketing
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDB
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
 
Soa design pattern
Soa design patternSoa design pattern
Soa design pattern
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 

Ähnlich wie Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows AzureDamir Dobric
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Yogi Kulkarni
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...HostedbyConfluent
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionJBUG London
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionC2B2 Consulting
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi
 

Ähnlich wie Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Cdn cs6740
Cdn cs6740Cdn cs6740
Cdn cs6740
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

  • 1. Flipkart Website Architecture Mistakes & Learnings Siddhartha Reddy Architect, Flipkart
  • 5. www.flipkart.com • Started in 2007 • Current Architecture from mid 2010 • Evolution of the architecture presented as… Issue[1] RCA[2] Actions Learnings • *1+ Issue: Website is “slow” • [2] RCA = Root Cause Analysis
  • 6. Surviving & reacting to the environment INFANCY (2007 – MID-2010)
  • 8. RCA • Why? – MySQL queries taking too long • Why? – Too many queries – Many slow queries – Queries locking tables • Why? – Capacity • Hmm…
  • 9. Fixing it • Get beefier servers (the obvious) • Separate master_db, slave_db – Writes go to master_db – Reads from slave_db – Critical reads from master_db Writes Reads Reads Writes MySQL MySQL MySQL Replication Slave Master
  • 10. Learning from it • Scale-out databases reads by distributing load across systems • Isolate database writes from reads – Writes are (usually) more critical
  • 12. RCA • Why? – MySQL queries taking too long (on slave_db) • Why? – Too many queries – Many slow queries • Why? – Queries from analytics / reporting and other backend jobs • Urm…
  • 13. Fixing it • Analytics / reporting DB (archival_db) – Use MyISAM — optimized for reads – Additional indexes for quicker reporting Website Website Writes Reads Website Website Writes Reads MySQL MySQL Replication Slave 1 Master MySQL MySQL Replication Slave Master Replication Analytics MySQL Analytics Reads Slave 2 Reads
  • 14. Learning from it • Isolate the databases being used for serving website traffic from those being used for analytical/reporting • Isolate systems being used by production website from those being used for background processing
  • 15. Learning the basics BABY (2010 – 2011)
  • 17. RCA • Why? • How? – Instrumentation
  • 18. RCA - 1 • Why? – Logging a lot – PHP processes blocking on writing logs Request2 -> Process2 Writing Waiting Waiting Request1 Request3 Request2 Request2 Request3 -> Process1 -> Process3 :Process1 :Process2 :Process3 Log file
  • 19. RCA - 2 • Why? – Service Oriented Architecture (SOA) – Too many calls to remote services per request • Creating fresh connection for each call • All the calls are made in serial order Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response
  • 20. RCA - 3 • Why? – Configurability – Fetch a lot of “config” from database for serving each request Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response
  • 21. RCA – 1,2,3 • Why? – Logging a lot – SOA – Configurability • Why? – PHP’s process model • Argh!
  • 22. Fixing it • fk-w3-agent – Simple Java “middleware” daemon – Deployed on each web server – PHP communicates to it through local socket – Hosts pluggable “handlers”
  • 23. fk-w3-agent: LoggingHandler Request2 Request2 -> Process2 -> Process2 Request1 Request3 Request1 Request3 -> Process1 -> Process3 -> Process1 -> Process3 fk-w3- Log file agent Async / buffered Log file
  • 24. fk-w3-agent: ServiceHandler(s) Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response Call Receive request Send response fk-w3-agent fk-w3- agent Service1 Service2
  • 25. fk-w3-agent: ConfigHandler Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response Database Fetch all config from Receive request Send response fk-w3-agent fk-w3- agent Poll and cache Database
  • 26. Learning from it • PHP — good for frontend and templating – Gives a lot of agility – Limiting process model • Hurdle for high performance • Java — stability and performance • Horses for courses
  • 28. RCA • Why? – PHP processes taking up too much time – PHP processes taking up too much CPU • Why? – Product info deserialization taking up time/CPU – View construction taking up time/CPU
  • 29. Fixing it • Caching! • Cache fully constructed pages – For a few minutes – Only for highly trafficked pages (Homepage) • Cache PHP serialized Product objects – ~20 million objects – Memcache • Yeah! But… – Add caching => add complexity
  • 30. Caching: Complications (1) • “Caching fully constructed pages” • But parts of pages still need to be dynamic • Example: Logged-in user’s name • Impossible to do effective bucket testing • Or at least makes it prohibitively complex
  • 31. Caching: Complications (2) • “Caching PHP serialized Product objects” • Without caching: getProductInfo() Fetch from CMS • With caching, cache hit: getProductInfo() Fetch from Cache • With caching, cache miss: Fetch from Fetch from getProductInfo() Set in Cache Cache CMS
  • 32. Caching: Complications (3) • TTL: ∞ (i.e. no invalidation) • Pro-actively repopulate products in the cache – Receive “notifications” about product updates • Notification Server — pushes notifications raised by CMS • Use a persistent, distributed cache – Memcache => Membase, Couchbase
  • 33. Learning from it • Caching is a powerful tool for performance optimization • Caching adds complexities – Reduced by keeping cache close to data source – Think deeply about TTL, invalidation • Use caching to go from “acceptable performance” to “awesome performance” – Don’t rely on it to get to “acceptable performance”
  • 36. RCA • Why? – Search-service is slow (or Reviews-service is slow or Recommendations-service is slow) • But why is rest of website slow? – Requests to the slow service are blocking processing threads • Eh?!
  • 37. Let’s do some math • Let’s say – Mean (or median) response time: 100 ms – 8-core server – All requests are CPU bound • Throughput: 80 requests per second (rps) • Let’s also say – 95th Percentile response time: 1000 ms • Call them “bad requests” • 4 bad requests in a second – Throughput down to 44 rps • 8 bad requests in a second? – Throughput down to 8 rps
  • 38. Fixing it • Aggressive timeouts for all service calls – Isolate impact of a slow service • only to pages that depend on it • Very aggressive timeouts for non-critical services – Example: Recommendations • On a Product page, Search results page etc. • Not on My Recommendations page • Load non-critical parts of pages through AJAX
  • 39. Learning from it • Isolate the impact of a poorly performing services / systems • Isolate the required from the good-to-have
  • 41. RCA • Why? – Load average of web servers has spiked • Why? – Requests per second has spiked • From 1000 rps to 1500 rps • Why? – Large number of notifications of product information updates
  • 42. Fixing it • Separate cluster for receiving product info update notifications from the cluster that serves users • Admission control: Don’t let a system receive more requests than it can handle – Throttling • Batch the notifications
  • 43. Learning from it • Isolate the systems serving internal requests from those serving production traffic • Admission control to ensure that a system is isolated from the over-enthusiasm of a client • Look at the granularity at which we’re working
  • 45.
  • 47. Mistake? • Sub-optimal decision – Not all information/scenarios considered – Insufficient information – Built for a different scenario • Due to focus on “functional” aspects • A mistake is a mistake – … in retrospect

Hinweis der Redaktion

  1. “This has basically given us lots of opportunities to make mistakes. And make mistakes we did.”
  2. Website Architecture diagram goes here
  3. No