In 6 years, Wix grew from a small startup with traditional system architecture (based on a monolithic server running on Tomcat, Hibernate, and MySQL) to a company that serves 60 million users. To keep up with this tremendous growth, Wix’s architecture had to evolve from a monolithic system to microservices, using some interesting patterns like CQRS to achieve our goal of building a blazing fast highly scalable and highly available system.
Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
1. Aviran Mordo
Head of Back-end Engineering
@aviranm
linkedin.com/in/aviran
aviransplace.com
Scaling Wix with
Microservices Architecture
2.
3. Wix in Numbers
Over 61M users
1.5M new users/month
Static storage is >2PB of data
1.5TB new files/day
3 data centers + 3 clouds (Google, Amazon,Azure)
1.5B HTTP requests/day
900 people work atWix, of which ~ 300 in R&D
4. Initial Architecture
Built for fast development
Stateful login (Tomcat session), Ehcache, file uploads
No consideration for performance, scalability and testing
Intended for short-term use
Tomcat, Hibernate, custom web framework
Lighttpd
(file serving) MySQL
DB
Wix
(Tomcat)
5. The Monolithic Giant
One monolithic server that handled everything
Dependency between features
Changes in unrelated areas of the system caused deployment of
the whole system
Failure in unrelated areas will cause system wide downtime
7. Concerns and SLA
DataValidation
Security / Authentication
Data consistency
Lots of data
Edit websites
High availability
High performance
Lots of static files
Very high traffic volume
Viewport optimization
Long tail (immutable)
Serving Media
High availability
High performance
High traffic volume
Long tail (mutable)
View sites, created by
Wix editor
9. HTML
Editor
Flash
Editor
MSM
Private
Media
Public
Media
Editor Segment Public Segment
Premium
Services
eCommerse
List DB
App
Builder
App
Store
App
Market
Dashboard
Statics/me
dia
Mailer
TimeZone
Public
HTML API
Public API
(Flash)
MSP
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Flash SEO
Renderer
Sitemap
Renderer
Robots.txt
Renderer
User
Server
Template
Viewer
ContactsHUB
Activit
y
Site
Members
Provided
Mailing
Service
Comments
Snapshoter
User Pref
Feed Me
Shout-out Hotels
PETRI
Site Pref
Dist LoggerSlicer
eCom
Renderer
eCom Cart
eCom
Checkout
eCom
Catalog
eCom
Orders
Payment
Facade
Account
Info
HTML API
HTML
Embeder
BlogMobile
10. Microservices Guidelines
Each service has its own database (if one is needed)
Only one service can write to a specific DB table(s)
There may be additional read-only services that directly
accesses the DB (for performance reasons)
Services are stateless
No DB transactions
Cache is not a building block, but an optimization
11.
12. Microservices Tradeoffs
Each service has its own database (if one is needed)
Easy to scale microservices based on SLA concerns
Tradeoff – system complexity, performance
Only one service can write to a specific DB table(s)
De-coupling architecture – faster development
Tradeoff – system complexity / performance
May be additional read-only services that accesses the DB
Performance gain
Tradeoff coupling
Services are stateless
Easy to scale out (just add more servers)
Tradeoff performance / consistency
No DB transactions
Better DB performance, easier to scale
Tradeoff system complexity
14. Editor Server
Immutable JSON pages (~2.5M / day)
Site revisions
Active – standby MySQL cross datacenters
Editor Server
MySQL
Active
Sites
MySQL
Archive
15.
16. Protect The Data
DB outage with fast recovery = replication
Data poisoning/corruption = revisions / backup
Make the data available at all times = data distribution to multiple
locations / providers
19. No DB Transactions
Save each page (JSON) as an atomic operation
Page ID is a content based hash (immutable/idempotent)
Finalize transaction by sending site header (list of pages)
Can generate orphaned pages, not a problem in practice
21. Prospero – Wix Media Storage
2PB user media files
3M files uploaded daily
800M metadata records
Dynamic media processing
• Picture resize, crop and sharpen “on the fly”
• Watermark
• Audio format conversion
22. T
Google
Cloud
Prospero – Wix Media Manager
get image.jpg
First
fallback
Second
fallback
If not in
CDN
Amazon
x36
T
x36
T
x32
Austin
CDN
24. Public Segment Roles
Routing (resolve URLs)
Dispatching (to a renderer)
Rendering (HTML,XML,TXT)
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Sitemap
Renderer
Robots.txt
Renderer
www.example.com
Flash SEO
Renderer
26. Publish Site
Publish site header (a map of pages for a site)
Publish routing table
Publish site header / routes (CQRS)
Editor Segment Public Segment
27. Built For Speed
Minimize out-of-process hops (2 DB, 1 RPC)
Lookup tables are cached in memory, updated every few minutes
Denormalized data – optimize for read by primary key (MySQL)
Minimize business logic
28. How a Page Gets Rendered
Bootstrap HTML template that contains only data
Only JavaScript imports
JSON data (site-header + dynamic data)
No “real” HTML view
30. The average Intel Core
i750 can push up to 7
GFLOPS without
overclocking
31. Why JSON?
Easy to parse in JavaScript and Java/Scala
Fairly compact text format
Highly compressible (5:1 even for small payloads)
Easy to fix rendering bugs (just deploy a new code)
34. Serving a Site – Sunny Day
Archive
CDN Statics
Browser
http://example.wix.com
Store HTML
to cache
HTTP
Request
Notify
site view
LB
Public
Renderer
HTML
Resources / Media
HTTP
Request
35. Serving a Site – DC Lost
Archive
CDN Statics
Browser
http://example.wix.com
LB
Public
Renderer
LB
Public
Renderer
Change DNS
HTTP
Request
36. Serving a Site – Public Lost
Archive
Browser
http://example.wix.com
LB
Public
Renderer
Get
Cached HTML
Version
HTML
HTTP
Request
LB
Public
Renderer
Fallback to 2nd
DC
37. Living in the Browser
Archive
CDN Statics
Browser
http://example.wix.com
LB
Public
Renderer
Editor Pages
Fallback
JSON /
Media
HTML
HTTP
Request
Fallback
38. Summary
Identify your critical path and concerns
Build redundancy in critical path (for availability)
De-normalize data (for performance)
Minimize out-of-process hops (for performance)
Take advantage of client’s CPU power
39.
40. Aviran Mordo
Head of Back-end Engineering
Q&A
@aviranm
linkedin.com/in/aviran
aviransplace.com
http://engineering.wix.com
http://goo.gl/wlq9Ih
@WixEng
Editor's Notes
How many built a website?
Editor – Read immediately after write – Small working set
Viewer optimize for reads
We fight for every ms. Page view = many resource downloading
Read-only services only if it is part of the same business functionality
Read-only services only if it is part of the same business functionality
Immutable data helps handle eventual consistency
MySql is a great key-value store
Not all data is equal (only 6% of websites are edited 3 months after creation)
Revision keep data safe from poisoning Pay in storage and management
Highly optimized code – every ms count
We can change the arrows as we want
Tech vendor lock is a myth, easy to change the api (small dev effort).
Invest in data distribution.
Evaluation of new platform starts by putting the data.
Save pages on JSON
Upload to static storage
Explain what is JSON and what is HTML
UPS dies, secondary power source connected to the same UPS