How do you know what 60 millions users like? Wix.com is conducting hundreds of experiments per month on production to understand which features our users like and which hurt or improve our business. In this talk we’ll explain how the engineering team is supporting product managers in making the right decisions and getting our product road map on the right path. We will also present some of the open source tools we developed that help us experimenting our products on humans.
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
The Art of A/B Testing
1. The Art of A/B Testing
Experimenting on Humans
Aviran Mordo
Head of Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
2.
3. Wix In Numbers
Over 100M users + 1.5M new users/month
Static storage is >5Pb of data
3 data centers + 3 clouds (Google, Amazon,Azure)
5B HTTP requests/day
1500 people work atWix, of which ~ 600 in Engineering
4.
5. BasicA/B testing
Experiment driven development
PETRI –Wix’s 3rd generation open source experiment system
Challenges and best practices
Complexities and effect on product
Agenda
21. EVERY new feature is A/B tested
We open the new feature to a % of users
Measure success
If it is better, we keep it
If worse, we check why and improve
If flawed, the impact is just for % of our users
Conclusion
22.
23.
24. New code can have bugs
Conversion can drop
Usage can drop
Unexpected cross test dependencies
Sh*t happens (Test could fail)
26. First time visitors = Never visited wix.com
New registered users = Untainted users
Existing registered users = Already familiar with the service
Not all users are equal
32. Halting the test results in loss of data.
What can we do about it?
33. Solution – Pause the experiment!
• Maintain NEW experience for already exposed users
• No additional users will be exposed to the NEW feature
34. PETRI’s pause implementation
Use cookies to persist assignment
If user changes browser assignment is unknown
Server side persistence solves this
You pay in performance & scalability
35. Decision (What to do with the data)
Keep feature Drop feature
Improve code &
resume experiment
Keep backwards compatibility for exposed
users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)
36.
37. Numbers look good but sample size is small
We need more data!
Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0%Control Group (A)
Test Group (B)
39. Signed-in user
Test group is determined by the user ID
Guarantee toss consistency across browsers
Anonymous user (Home page)
Test group is randomly determined
Cannot guarantee consistent experience cross browsers
11% ofWix users use more than one desktop browser
Keeping consistent UX
43. # of active experiment Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~1000 active experiments
~1.071509e+301
44. Supporting 2^N different users is challenging
How do you know which experiment causes errors?
Managing an ever changing production env.
45. Near real time user BI tools
Override options (URL parameters, cookies, headers…)
Specialized tools
52. Enable features by existing content
What will happened when you remove a component
Enable features by document owner’s assignment
The friend now expects to find the new feature on his own docs
Exclude experimental features from shared documents
You are not really testing the entire system
Possible solutions
53.
54. Petri is more than just an A/B test framework
Feature toggle
A/B Test
Personalization
Internal testing
Continuous
deployment
Jira integration
Experiments
Dynamic
configuration
QA
Automated
testing
55. Petri is an open source project
https://github.com/wix/petri
58. Modeled experiment lifecycle
Open source (developed usingTDD from day 1)
Running at scale on production
No deployment necessary
Both back-end and front-end experiment
Flexible architecture
Why Petri
Who here does A/B tests?
Who plans to do A/B test?
A/B test is embedded in our development process
Petri is based on our experience and lessons we learned
You divide your users into group and measure which reached your goal
What does it mean better?
What is your goal?
Measure conversion to register
Not just comparing pages but also in development
The theory – we can make a better gallery
Our goal – make it easier for our users to build their sites (converting to premium)
It is not about winning, its about not losing
Lessons learned from 5 years of experience
Petri allows PM to manage their tests
A screenshot of the UI we built on top of PETRI
PM added a Premium link in the editor
If we shorten the funnel more users will reach the purchase page, thus increasing our sales
PM added a Premium link in the editor
Why did it fail.
T-Shirt time
Users were taken out of editing context before they were happy with the results
Who thinks we should start with 50%
Remember a test could fail
Product manager defines a limited new experiment
Product manager defines a limited new experiment
We also test new must have features
There is no A version.
Control group just don’t get it.
we need to improve before releasing to all users.
Lose mobile view ?
Unable to update ?
Pause is a temporary state until system improves and resume test
Server side state – performance vs correctness, cross-datacenters replicas
The end result of every A/B test is reaching a decision. For this we need enough numbers.
Add %, countries etc’
As discussed in the pause scenario, here too we cannot take away the ‘new’ experience
For anonymous users – this is the best we can do. This means sometimes (~11%) users will see different experiences.
What would you expect the result should be for a bot? A? B?
2-nd T-Shirt time!
Production is never in a ‘known’ state
At least 2^ (more than 2 options)
It is hard to know and we don’t always know exactly.
Try to understand what was opened recently / recreate and eliminate
Overrides also list of users.
The obvious answer may be – allow the friend to edit the component if it’s already in the site
But then – what if the friend deletes the component by mistake (or on purpose)? Then if he’s assigned to A he won’t be able to add it back.
Possible solution – assign by site owner instead of by user
(this means you must implement server side state) (why? Bcos you don’t know what lang/geo etc the site owner was when he got assigned – you only know his user id)
Not perfect, user may experience something else on his own document
Expose features internally to company employees
Select assignment by sites (not only by users)