Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Lessons learned from growing LinkedIn to 400m members - Growth Hackers Conference 2016
Download to read offline and view in fullscreen.


Scaling LinkedIn - A Brief History

Download to read offline

Visual story of how LinkedIn Engineering scaled it's architecture, infrastructure, and operations to support its 400+ million members.

Blog version of slidedeck:

Learn about LinkedIn's early history, its technology, and lessons on how to scale web architectures.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Scaling LinkedIn - A Brief History

  2. Scaling = replacing all the components of a car while driving it at 100mph “ Via Mike Krieger, “Scaling Instagram”
  3. LinkedIn started back in 2003 to “connect to your network for better job opportunities.” It had 2700 members in first week.
  4. First week growth guesses from founding team
  5. 0M 50M 300M 250M 200M 150M 100M 400M 32M 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 5 400M 350M Fast forward to today...
  6. LinkedIn is a global site with over 400 million members Web pages and mobile traffic are served at tens of thousands of queries per second Backend systems serve millions of queries per second LINKEDIN SCALE TODAY
  7. 7 How did we get there?
  8. Let’s start from the beginning
  9. LEO DB
  10. DB LEO ● Huge monolithic app called Leo ● Java, JSP, Servlets, JDBC ● Served every page, same SQL database LEO Circa 2003 LINKEDIN’S ORIGINAL ARCHITECTURE
  11. So far so good, but two areas to improve: 1. The growing member to member connection graph 2. The ability to search those members
  12. ● Needed to live in-memory for top performance ● Used graph traversal queries not suitable for the shared SQL database. ● Different usage profile than other parts of site MEMBER CONNECTION GRAPH
  13. MEMBER CONNECTION GRAPH So, a dedicated service was created. LinkedIn’s first service. ● Needed to live in-memory for top performance ● Used graph traversal queries not suitable for the shared SQL database. ● Different usage profile than other parts of site
  14. ● Social networks need powerful search ● Lucene was used on top of our member graph MEMBER SEARCH
  15. ● Social networks need powerful search ● Lucene was used on top of our member graph MEMBER SEARCH LinkedIn’s second service.
  16. LINKEDIN WITH CONNECTION GRAPH AND SEARCH Member GraphLEO DB RPC Circa 2004 Lucene Connection / Profile Updates
  17. Getting better, but the single database was under heavy load. Vertically scaling helped, but we needed to offload the read traffic...
  18. ● Master/slave concept ● Read-only traffic from replica ● Writes go to main DB ● Early version of Databus kept DBs in sync REPLICA DBs Main DB Replica ReplicaDatabus relay Replica DB
  19. ● Good medium term solution ● We could vertically scale servers for a while ● Master DBs have finite scaling limits ● These days, LinkedIn DBs use partitioning REPLICA DBs TAKEAWAYS Main DB Replica ReplicaDatabus relay Replica DB
  20. Member GraphLEO RPC Main DB ReplicaReplicaDatabus relay Replica DB Connection Updates R/WR/O Circa 2006 LINKEDIN WITH REPLICA DBs Search Profile Updates
  21. As LinkedIn continued to grow, the monolithic application Leo was becoming problematic. Leo was difficult to release, debug, and the site kept going down...
  22. Kill LEOIT WAS TIME TO...
  23. Public Profile Web App Profile Service LEO Recruiter Web App Yet another Service Extracting services (Java Spring MVC) from legacy Leo monolithic application Circa 2008 on SERVICE ORIENTED ARCHITECTURE
  24. ● Goal - create vertical stack of stateless services ● Frontend servers fetch data from many domains, build HTML or JSON response ● Mid-tier services host APIs, business logic ● Data-tier or back-tier services encapsulate data domains Profile Web App Profile Service Profile DB SERVICE ORIENTED ARCHITECTURE
  25. Groups Content Service Connections Content Service Profile Content Service Browser / App Frontend Web App Mid-tier Service Mid-tier Service Mid-tier Service Edu Data Service Data Service Hadoop DB Voldemort EXAMPLE MULTI-TIER ARCHITECTURE AT LINKEDIN Kafka
  26. PROS ● Stateless services easily scale ● Decoupled domains ● Build and deploy independently CONS ● Ops overhead ● Introduces backwards compatibility issues ● Leads to complex call graphs and fanout SERVICE ORIENTED ARCHITECTURE COMPARISON
  27. bash$ eh -e %%prod | awk -F. '{ print $2 }' | sort | uniq | wc -l 756 ● In 2003, LinkedIn had one service (Leo) ● By 2010, LinkedIn had over 150 services ● Today in 2015, LinkedIn has over 750 services SERVICES AT LINKEDIN
  28. Getting better, but LinkedIn was experiencing hypergrowth...
  29. ● Simple way to reduce load on servers and speed up responses ● Mid-tier caches store derived objects from different domains, reduce fanout ● Caches in the data layer ● We use memcache, couchbase, even Voldemort Frontend Web App Mid-tier Service Cache DB Cache CACHING
  30. There are only two hard problems in Computer Science: Cache invalidation, naming things, and off-by-one errors. “ Via Twitter by Kellan Elliott-McCrea and later Jonathan Feinberg
  31. CACHING TAKEAWAYS ● Caches are easy to add in the beginning, but complexity adds up over time. ● Over time LinkedIn removed many mid-tier caches because of the complexity around invalidation ● We kept caches closer to data layer
  32. CACHING TAKEAWAYS (cont.) ● Services must handle full load - caches improve speed, not permanent load bearing solutions ● We’ll use a low latency solution like Voldemort when appropriate and precompute results
  33. LinkedIn’s hypergrowth was extending to the vast amounts of data it collected. Individual pipelines to route that data weren’t scaling. A better solution was needed...
  34. KAFKA MOTIVATIONS ● LinkedIn generates a ton of data ○ Pageviews ○ Edits on profile, companies, schools ○ Logging, timing ○ Invites, messaging ○ Tracking ● Billions of events everyday ● Separate and independently created pipelines routed this data
  36. A WHOLE LOT OF CUSTOM PIPELINES... As LinkedIn needed to scale, each pipeline needed to scale.
  37. Distributed pub-sub messaging platform as LinkedIn’s universal data pipeline KAFKA Kafka Frontend service Frontend service Backend Service DWH Monitoring Analytics HadoopOracle
  38. BENEFITS ● Enabled near realtime access to any data source ● Empowered Hadoop jobs ● Allowed LinkedIn to build realtime analytics ● Vastly improved site monitoring capability ● Enabled devs to visualize and track call graphs ● Over 1 trillion messages published per day, 10 million messages per second KAFKA AT LINKEDIN
  40. Let’s end with the modern years
  41. ● Services extracted from Leo or created new were inconsistent and often tightly coupled ● was our move to a data model centric architecture ● It ensured a consistent stateless Restful API model across the company. REST.LI
  42. ● By using JSON over HTTP, our new APIs supported non-Java-based clients. ● By using Dynamic Discovery (D2), we got load balancing, discovery, and scalability of each service API. ● Today, LinkedIn has 1130+ resources and over 100 billion calls per day REST.LI (cont.)
  43. Automatic API-documentation REST.LI (cont.)
  44. R2/D2 tech stack REST.LI (cont.)
  45. LinkedIn’s success with Data infrastructure like Kafka and Databus led to the development of more and more scalable Data infrastructure solutions...
  46. ● It was clear LinkedIn could build data infrastructure that enables long term growth ● LinkedIn doubled down on infra solutions like: ○ Storage solutions ■ Espresso, Voldemort, Ambry (media) ○ Analytics solutions like Pinot ○ Streaming solutions ■ Kafka, Databus, and Samza ○ Cloud solutions like Helix and Nuage DATA INFRASTRUCTURE
  48. LinkedIn is a global company and was continuing to see large growth. How else to scale?
  49. ● Natural progression of horizontally scaling ● Replicate data across many data centers using storage technology like Espresso ● Pin users to geographically close data center ● Difficult but necessary MULTIPLE DATA CENTERS
  50. ● Multiple data centers are imperative to maintain high availability. ● You need to avoid any single point of failure not just for each service, but the entire site. ● LinkedIn runs out of three main data centers, additional PoPs around the globe, and more coming online every day... MULTIPLE DATA CENTERS
  51. MULTIPLE DATA CENTERS LinkedIn's operational setup as of 2015 (circles represent data centers, diamonds represent PoPs)
  52. Of course LinkedIn’s scaling story is never this simple, so what else have we done?
  53. ● Each of LinkedIn’s critical systems have undergone their own rich history of scale (graph, search, analytics, profile backend, comms, feed) ● LinkedIn uses Hadoop / Voldemort for insights like People You May Know, Similar profiles, Notable Alumni, and profile browse maps. WHAT ELSE HAVE WE DONE?
  54. ● Re-architected frontend approach using ○ Client templates ○ BigPipe ○ Play Framework ● LinkedIn added multiple tiers of proxies using Apache Traffic Server and HAProxy ● We improved the performance of servers with new hardware, advanced system tuning, and newer Java runtimes. WHAT ELSE HAVE WE DONE? (cont.)
  55. Scaling sounds easy and quick to do, right?
  56. Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. “ Via  Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid
  57. Josh Clemm THANKS!
  58. ● Blog version of this slide deck ● Visual story of LinkedIn’s history ● LinkedIn Engineering blog ● LinkedIn Open-Source ● LinkedIn’s communication system slides which include earliest LinkedIn architecture http://www.slideshare. net/linkedin/linkedins-communication-architecture ● Slides which include earliest LinkedIn data infra work LEARN MORE
  59. ● Project Inversion - internal project to enable developer productivity (trunk based model), faster deploys, unified services freeze-that-saved-linkedin ● LinkedIn’s use of Apache Traffic server server ● Multi Data Center - testing fail overs angel-au-yeung LEARN MORE (cont.)
  60. ● History and motivation around Kafka ● Thinking about streaming solutions as a commit log should-know-about-real-time-datas-unifying ● Kafka enabling monitoring and alerting ● Kafka enabling real-time analytics (Pinot) ● Kafka’s current use and future at LinkedIn ● Kafka processing 1 trillion events per day kafka-linkedin LEARN MORE - KAFKA
  61. ● Open sourcing Databus latency-change-data-capture-system ● Samza streams to help LinkedIn view call graphs apache-samza ● Real-time analytics (Pinot) ● Introducing Espresso data store distributed-document-store LEARN MORE - DATA INFRASTRUCTURE
  62. ● LinkedIn’s use of client templates ○ Dust.js ○ Profile ● Big Pipe on LinkedIn’s homepage ● Play Framework ○ Introduction at LinkedIn https://engineering.linkedin. com/play/composable-and-streamable-play-apps ○ Switching to non-block asynchronous model and-callback-hell LEARN MORE - FRONTEND TECH
  63. ● Introduction to and how it helps LinkedIn scale ● How expanded across the company LEARN MORE - REST.LI
  64. ● JVM memory tuning throughput-and-low-latency-java-applications ● System tuning low-latency-high-throughput-databases ● Optimizing JVM tuning automatically difficulties-and-using-jtune-solution LEARN MORE - SYSTEM TUNING
  65. LinkedIn continues to grow quickly and there’s still a ton of work we can do to improve. We’re working on problems that very few ever get to solve - come join us! WE’RE HIRING

    Aug. 31, 2021
  • MatheusMachado36

    Aug. 30, 2021
  • AndersonOvoAluya

    Jun. 28, 2021
  • HossainAhmad22

    Apr. 7, 2021
  • cbshivers

    Mar. 10, 2021
  • amamaj

    Mar. 10, 2021
  • JamieWyatt8

    Mar. 5, 2021
  • jalalorouji

    Feb. 28, 2021
  • GovindPatel15

    Jan. 3, 2021
  • pikaneige

    Nov. 28, 2020
  • chiranjivichiranjivi

    Sep. 8, 2020
  • helloworldgr

    Jun. 24, 2020
  • rajeshanji8

    Jun. 11, 2020
  • davye

    Apr. 29, 2020
  • BarryWhittington

    Apr. 12, 2020
  • AnnahScott

    Feb. 12, 2020
  • victorjung_

    Feb. 6, 2020
  • omaruriel

    Jan. 27, 2020
  • talesdiegoAlmeidamel

    Jan. 2, 2020
  • PrakashBudha1

    Dec. 23, 2019

Visual story of how LinkedIn Engineering scaled it's architecture, infrastructure, and operations to support its 400+ million members. Blog version of slidedeck: Learn about LinkedIn's early history, its technology, and lessons on how to scale web architectures.


Total views


On Slideshare


From embeds


Number of embeds