3. Wix In Numbers
• Over 45,000,000 users
– >1M new users/month
• Static storage is >800TB of data
– >1.5TB new files/day
• 3 Data centers + 2 Clouds (Google, Amazon)
– ~300 servers
• >700M HTTP requests/day
• ~600 people work at Wix
– Of which ~ 200 in R&D
13. Where We Were
• We were working traditional waterfall
• With fear of change
– It is working, why touch it?
– Uploading a release means downtime and bugs!
• With low product quality
• With slow development velocity
• With tradition enterprise development lifecycle
– Three months of a “VERSION” development and QA
– Six months of crisis mode cleaning bugs and stabilizing system
18. Lean Product development
“Top 5 Most-Used Commands in Microsoft Word
• Paste
• Save
• Copy
• Undo
• Bold
These five commands account for around 32% of
the total command use in Word.
Paste itself accounts for more than 11% of all commands
used, and has more than twice as much usage as the #2 entry on the list, Save.
Beyond the top 10 commands, the curve flattens
out considerably.
The percentage difference in usage between the #100 command ("Accept Change") and the
#400 command ("Reset Picture") is about the same in difference between #1 and #11 ("Change
Font Size") “
19. Scaling challenges – Product
• Product Minimum Viable Product
(MVP)
– Does MVP meet your product
standards?
• What about tooltip, help,first time ux,
etc.. ?
– How to define a product that can be
developed in a day ?
– And that can win in a/b test …
To Be
Implemented
20. Get out of thought land
• The law of failure
– Most new “its” will fail even if they are flawlessly executed
• Invest less, in-touch less , better ability to admit it fail
– Data beats opinions - let the customer decide
make sure you building the right it before build it right
Quick
Feedback
22. Risk
• Waterfall - minimize number of deployments
• CD - minimize number of changes and impact in $$
04:30
Risk = #deployments * chance of something going wrong (~ number
of changes) * impact of something wrong in $$
23. Small Development Iterations
• No Waterfall
• No Scrum
• No Iterations
• No long documents
• Build something small
• When it is ready, deploy it
– Measure it
– Then fix it
– Again
– And again, until Dev, Product and Customers are
happy
• Then start changing it
– Again, as a small change
25. What Is The Common Denominator?
• Product manager
• Project manager
• QA
• Operations
• DBA
26. CD is culture & mindset
• Trust the developers
– Empower developers to change production
– Developer knows his system best
• Automation as a default choice
– no more “is it worth to automate ? ”
– Everything should be automated
• Welcome to the twilight zone
– Product/Dev/QA boundaries are going down
– Everyone need to care about everything
– Less formality : Corridor - IN , Meeting Room - Out
27. Dev Centric Culture – Involve The Developer
• Product definition (with product)
• Development (with architect)
• Testing (with QA developers)
• Deployment / Rollback(with
operations)
• Monitoring / BI (with BI team)
• DevOps – to enable deployment
and rollback, fully automated
28. Continuous Delivery – Key points
• Abandon the “VERSION” paradigm – move to a
feature centric methodology
• Make small and frequent release as soon as
possible
• Automate everything – TDD/CI/CD
• Measure everything
– A/B test every new feature
– Monitor real KPIs (business, not CPU)
• Deploy without downtime
04:30
29. Test Driven Development
• No new code is pushed to Git without being fully tested
– We currently have around 10,000 automated tests
• Before fixing a bug first write a test to reproduce the bug
• Cover legacy (untested) systems with Integration tests
04:30
30. What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test code).
• TDD has no significant impact on quality
04:30
31. What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test code).
• TDD has no significant impact on quality
04:30
32. TDD Actual impact on development
• We develop products faster
• Removes fear of change
• Easier to enter some one else’s project
• Do we still need QA? (Yes, they code automation tests)
– We don’t have QA for back-end applications
• Writing a feature is 10-30% slower, 45-90% less bugs
• 50% faster to reach production.
• Considerably less time to fix bugs
04:30
34. Is Refactoring Rework?
Absolutely NOT !
• Refactoring is the outcome of learning
• Refactoring is the cornerstone of improvement
• Refactoring builds the capacity to change
• Refactoring doesn’t cost, it pays
04:30
35. Refactoring
• Refactor from inside out
– Small iterations with tests
– Refactor small methods -
make sure the tests don’t
break
– Deploy often
• Re-write from the outside in
– Write from scratch (one piece
at a time)
– Code duplication sometimes
needed (temporary)
– Protected by Feature Toggle
04:30
Before refactoring make sure everything is covered with tests
- Legacy code usually covered by IT tests
39. Feature Toggles
• Everyone develops on the Trunk
• Every piece of code can get to production at anytime
04:30
40. Feature Toggle to the rescue
• Unused new code can go to production – no harm done
• Operational new code goes with a guard – use new or old code by feature
toggle
04:30
42. DB Schema Changes Without Downtime
• Adding columns
– Use another table link by primary key
– Use blob field for schema flexibility
• Removing fields
– Stop using. Do not do any DB schema changes
04:30
43. New DB schema with data migration
• Plan a lazy migration path controlled by feature toggle
1. Write to old / Read from old
2. Write to both / Read from old
3. Write to both / Read from new, fallback to old
• Backward compatibility is a must
4. Write to new / Read from new, fallback to old
5. Eagerly migrate data in the background
6. Write to new / Read from new
04:30
44. Feature Toggle Strategies (gradual expose users)
• Company employees
• Specific users or group of users
• Percentage of traffic
• By GEO
• By Language
• By user-agent
• User Profile based
• By context (site id or some kind of hash on site id)
04:30
45. Feature Toggle Override
• By specific server
– Used to test system load
– New database flows/migration
– Refactoring that may affect performance and memory usage
• By Url parameter
– Enable internal testing
– Product acceptance
– Faking GEO
• By FT cookie value
– Testing
– When working with API on a single page application
04:30
48. A/B Test
• Every new feature is A/B tested
• We open the new feature to a % of users
– Define KPIs to check if the new feature is better or worse
– If it is better, we keep it
– If worse, we check why and improve
– If we find flaws, the impact is just for % of our users (kind of
Feature Toggle)
04:30
49. An interesting site effect on product
• How many times did you have the conversion “what is
better”?
– Put the menu on top / on the side
• Well, how about building both and A/B Testing?
04:30
50. Marking users with toss value in a cookie
• Anonymous user
– Toss is randomly determined
– Can not guarantee persistent experience if changing browser
• Registered User
– Toss is determined by the user ID
– Guarantee toss persistency across browsers
– Allows setting additional tossing criteria (for example new users only)
– Only use this for sections that a user has to be authenticated
04:30
51. • Do not mix anonymous and registered tests
• AB test parentage of users with optional filters
– New Users Only (Registered users only)
– By language
– By GEO
– By Browser
– user-agent
– OS
– Any other criteria you have on your users
04:30
52. A/B Test Features
• A/B Test Override
– Allows to set a value of a test for validation
– Helps support experience what users experiencing
• Override methods
– Via URL parameter
– Via cookie
• Start/Stop Test
• Pause tests
• Bots always get “A”
04:30
54. Gradual Deployment
04:30
• Assume two components
• We shutdown one and install on it the
new version. It is not active yet
• Do self test
• Activate the new server it is passes self test
• Continue deploying the other servers,
a few at a time, checking each one with
self test
A 1.1 B 1.1
A 1.1
B 1.2
A 1.1
A 1.1
B 1.1
B 1.1
A 1.1
A 1.1
B 1.1
B 1.2
A 1.1
B 1.2
A 1.1
A 1.1
B 1.1
B 1.2
A 1.1 B 1.1
A 1.1
A 1.1
B 1.1
B 1.2
55. Self Test / Post Deployment Test
After each server deployment run a self test before deploying the next server.
• Checking server configuration and topology
– Make sure database is accessible (DB connection string)
– Is the schema the one I expect
– Access required local resources (data files, other config files, templates, etc’)
– Access remote resources
– RPC / REST endpoints reachable and operational
• Server will refuse requests unless it passes the self test
• Allow a way to skip self test (and continue deployment)
04:30
57. Backward and Forward compatible
• Assume two components
• We release a new version of one
• Now Rollback the other…
04:30
A 1.1
B 1.2
A 1.2
B 1.1A 1.1A 1.1
B 1.1
B 1.2
A 1.2A 1.1
B 1.1B 1.1
A 1.1 B 1.1A 1.1A 1.1 B 1.1B 1.1
A 1.0
A 1.2A 1.1 B 1.2B 1.1
B 1.2 A 1.2
A 1.2A 1.1 B 1.2B 1.1
B 1.0
58.
59. Time machine event =
• Deployment capabilities : “no click” deployment
– Dozens of services , 130+ servers over 3 Data Centers
• Backward and forward compatibility at the extreme field test case
– Mixed versions of services / DB with no service downtime
• Empowerment
– The power we give to individual
• Risk taken and failure embracement
60. CD – prepare to invest…..
• Dev infrastructure - Refactor , Refactor, Refactor
• Testing infrastructure & know how
• Deployment infrastructure & tools
• Automation , Automation , Automation
• Monitoring (business and technical)
– hundreds of aspects
– thresholds use is a Must
– Monitor business KPIs
– Internal & external
– Endless Tuning & learning
61. How does it work – CD Practices
• Test driven development
• Small Development Iterations
• Backwards and Forwards compatible
• Gradual Deployment & Self-Test
• Feature Toggle
• A/B Testing
• Exception Classification
• Production visibility
04:30
67. Where are we today?
• We have re-written our flash editor product as an HTML 5 editor
– In just 4 months
• Introduced Wix 3rd party applications (developers API)
– In just 6 weeks
• We are easily replacing significant parts of our infrastructure
• And we are doing ~50 releases a day!
• Production state changes every 9 minutes.
04:30