Imagine: as soon as any developed functionality is submitted into the code repository, it is automatically subjected to the appropriate battery of tests and then released straight into the wild. Setting up the pipeline to do just that is very common, but most organizations hit the same stumbling block: just what IS the appropriate battery of tests? And how do we configure our test framework to support the many places our tests might want to run? Automated build pipelines don't always lend themselves well to the traditional stages of testing. In this hands-on workshop, Melissa will introduce automated test writers to the key principles of automated test design that apply to organizations big and small to allow them to take full advantage of their pipeline's capabilities without introducing unnecessary bottlenecks. Participants will learn the fundamentals of highly reliable tests that run fast and atomically in order to reproduce a failure - every time. They will also explore how to reduce overlap while still maintaining adequate test coverage, about what test areas might be most beneficial to combine into a single suite, and what areas might benefit most from being broken out altogether.
Test Design for Fully Automated Build Architectures
1. Melissa Benua / Sr Director of Engineering
Test Design for Fully
Automated Build
Architectures
STARWEST 2023
2. ABOUT ME
Melissa Benua
Sr Director of Engineering,
Platform
mParticle
Get in touch!
mbenua@gmail.com
@queenofcode
Follow my work!
https://www.linkedin.com/in/mbe
nua/
https://www.slideshare.net/Meliss
aBenua/
https://github.com/queen-of-
code/
https://www.queenofcode.net @QUEENOFCODE
3. ABOUT THE TUTORIAL
1. Track Work
Ensure the right
change is being
made
2. Write Code +
Tests
Keep prod and test
code together in
source control
3. Request a Peer
Review
Automated system to run
build + unit test + static
6. Monitor Change in
Prod
Ensure nothing
unexpected happened
with the change
5. Merge Peer
Review
Land code + tests
together for
deployment
4. Deploy Change for
Testing
Validate the change
works in a prod-like
@QUEENOFCODE
4. KEY SECTIONS
Learning
continuous test
case principles
and features
1
Defining the
continuous
integration
pipeline
2
Categorizing test
cases into suites
3
Leveraging
observability to
backstop testing
4
@QUEENOFCODE
15. EXERCISE: TEST CASES
Photo
Gallery Site
UX FrontEnd
(HTML5 + JS)
Logic BackEnd
(Java)
Data Store
(NoSQL)
• What to test?
• All ideas are good
ideas!
@QUEENOFCODE
17. CI + CD PIPELINE
1. Track Work
Ensure the right
change is being
made
2. Write Code +
Tests
Keep prod and test
code together in
source control
3. Request a Peer
Review
Automated system to run
build + unit test + static
6. Monitor Change in
Prod
Ensure nothing
unexpected happened
with the change
5. Merge Peer
Review
Land code + tests
together for
deployment
4. Deploy Change for
Testing
Validate the change
works in a prod-like
@QUEENOFCODE
18. UNIT TEST GUIDELINES
Does not cross application
boundaries
Should support parallelism
Textual `Input -> Output` validation
Does NOT leave the box
@QUEENOFCODE
23. MORE DEFINITIONS
@QUEENOFCODE
Functional Testing
Testing that
validates
functionality is
working as
intended in
requirements
(usually an
automated test)
Non-Functional
Testing
Testing that
validates aspects
like security,
performance,
scalability, etc of
the software
Smoke Test Suite
Suite of functional
automated tests
that validates the
Minimum Viable
Product
Regression Test Suite
Suite of functional
automated tests
that validates the
Minimum Lovable
Product
24. EXAMPLE SERVICE ARCHITECTURE
UI App
JS Framework
ASP Frontend
Web Server
Backend App
Auth Service
RESTful API
Cache Service
Data Layer
Database
File Storage
@QUEENOFCODE
25. TEST FLOW
Build Change 01
Compile
Code
Run Unit
Tests
Create App
Package
02
Deploy App
Package
03
Validate
Product
04
Package
code
Create
Docker
image
Deploy app /
start Docker
container
Run Auto
Tests
Run SemiAuto
Tests
Run Manual
Tests @QUEENOFCODE
28. PAIR CODE
CHANGES
TO
APPROPRIA
TE TESTS!
@QUEENOFCODE
Groups of functionality should
have groups of tests
Microrepo? Put the tests with
their code!
Tag code packages with their
test packages
Use compilation to tell you what
tests apply to what code
29. INTEGRATION TEST MATRIX –
LOGIN SCENARIO
Frontend App
ASP
Frontend
JS FW
Web Server
Backend App
Auth Service
RESTful API
Cache Service
Data Layer
SQL DB
NoSQL DB
@QUEENOFCODE
30. INTEGRATION TEST MATRIX – API
SCENARIO
UI App
JS Framework
ASP Frontend
Web Server
Backend App
Auth Service
RESTful API
Cache Service
Data Layer
SQL DB
NoSQL DB
@QUEENOFCODE
31. EXERCISE: MAPPING CASES TO
CATEGORIZED SUITES
Photo
Gallery Site
UX FrontEnd
(HTML5 + JS)
Logic BackEnd
(Java)
Data Store
(NoSQL)
• What should we run?
• When should we run it?
• How long should we wait?
@QUEENOFCODE
32. KEY TAKEAWAYS
Run the tests
most relevant to
your changes
first
1
Having multiple
test suites is
cheap
2
Machine time is
MUCH cheaper
than human
time
3
Fail fast and fail
often
4
@QUEENOFCODE
35. IF YOUR SITE CRASHES ON THE
INTERNET BUT ISN’T MONITORED,
IS IT REALLY DOWN???
@QUEENOFCODE
36. PILLARS OF OBSERVABILITY
Logging
•Detailed information
•Specific events
Metrics
•Systemic
informatio
n
•Events
over time
Tracing
•Systemic information
•Specific events
@QUEENOFCODE
38. METRIC TYPES
Rate
Success Requests Per
Second
Failure Requests Per
Second
Events Per Second
Percentile
Request Latency
Database
Latency
6
0
%
9
0
%
60 75 50
90
Numb
er
Max items in
queue
Bytes in use
2-Feb 3-Feb 4-Feb 5-Feb
Avg 90th 99th
@QUEENOFCODE
41. METRICS AND AUTOMATION
Function Call Validate
A = 2
B = 5
Calls ==
11
5 == A
0 == B
12 == Calls
Create Input
void Call(ref int A, ref int B)
{
A = DoWork();
B = DoOtherWork();
CallsCounter.Increment();
}
@QUEENOFCODE
42. KEY TAKEAWAYS
Reliable, fast,
specific tests are
king
1
Know where your
cutline is and
respect your time
2
Don’t try to boil
the ocean – rely on
the backstop!
3
Automate what
you can and make
a plan for the rest
4
@QUEENOFCODE
43. Thank you!
GET IN TOUCH!
MELISSA BENUA
MBENUA@GMAIL.COM
TWITTER: @QUEENOFCODE
HTTPS://QUEENOFCODE.NET
Hinweis der Redaktion
Talk about Microsoft and Bing especiallyKey things to hit:* Worked with continuous builds in some form for my entire career
Worked as an SDET, as a straight dev, and as a combined engineer
Created this from scratch at PlayFab
Talk about Microsoft and Bing especiallyKey things to hit:* Worked with continuous builds in some form for my entire career
Worked as an SDET, as a straight dev, and as a combined engineer
Created this from scratch at PlayFab
This is an automated pipeline!
Don’t get too in-depth, we will talk moreTalk notes:* Please speak up if there’s a question
Feel free to ask questions at any time, or for examples or to deep-dive* Ask about experience in the room – who is working greenfield? Updating an old existing code base? Working in an existing CI/CD pipeline?
What we are going to talk about:
What makes a good test case, when we want to run those test cases, and then how we monitor and report on those test cases
These are my guiding principles of test design – ESPECIALLY KEY when things are fully-automated
* Don’t bog down on what these mean, but give a high-level overview
In a perfect world, we would always run every test
In a perfect world, every test would be super fast and we would also have infinite time
Battlefield triage is key!
If you had time to only debug one of these two issues, which would you do?
The ‘time game’ is a fun way to figure out what tests really matter
Squeaky tests only train people to ignore test failures – and then they WILL ignore a real failure and it will go to prod by accident – ask me how I know!
Try – finally blocks are amazing – no catches required! Any test that touches the file system or DB system must do that.
Obviously – a test can’t be automated if it requires someone to manually purge the DB after it runs
Sometimes tools can help you find flaky tests
Be ruthless with cutting out overlapping coverage – read your code coverage reports and slash away!
Can be nerve-wracking to delete tests
Very tempting to roll lots of things into a single test case to save time, but this makes debugging a nightmare
* Watch out for global static variables. Watch out even more for a shared test database
NEVER (almost never) write a test that relies on Thread.Sleep. Timing issues are horrible to debug! Can you make it synchronous? A callback??
Caching and shared resources can introduce the weirdest and hardest to debug test failures. Time-killers! Spend time to work around them beforehand instead of spending time debugging the failure (and then having to work around them anyways)* Talk about the angst from the shared test Dynamo at Playfab and the days spent figuring out what happened and how to stop it
The goal is to come up with lots and lots of test cases, which we will later be organizing into test suites
DO NOT get into putting them into suites
DO NOT throw away tests
This is an automated pipeline!
Don’t get too in-depth, we will talk moreTalk notes:* Please speak up if there’s a question
Feel free to ask questions at any time, or for examples or to deep-dive* Ask about experience in the room – who is working greenfield? Updating an old existing code base? Working in an existing CI/CD pipeline?
What we are going to talk about:
What makes a good test case, when we want to run those test cases, and then how we monitor and report on those test cases
Unit tests may cross functional boundaries, in that they may test more than one ‘function’. But they shouldn’t test an app.
* Simplest structure means running one build, one set of tests, and calling it done. Few projects can be that simple! Tests grow with scale!
Breaking up a code base is half art and half science. (those are the bullet points)
Ideally every test would run all the time, but almost nobody has time for that.
Test what you can, and know what risk you are swallowing for later
Make sure each stage fails quickly if it’s going to fail. Do NOT propagate failures down the pipe.
‘Integration’ test here is category. A number of test suites can fall into this category – your smoke test suite, for example.
Speciality is anything that can’t be run automatically and quickly. Maybe it requires manual setup, or maybe it’s difficult to have a machine to validate. Maybe it just takes a MILLION years to run. Maybe it is unreliable.
This is the sample architecture of a fictional service – say, something like that image hosting site from our last example. We’ll use this architecture as a guide for the talking about what tests to write and when to run them.
Remember – if you’re going to have manual steps, it’s in step 4.
This is a sequence diagram that describes the lifecycle of a single change.The tester is actually in control here – we are conditionally running different tests suites depending on their judgement of the change
Validation pass includes semi-automated tests
Many more potential test suites in these categories.
* let’s use all the tests that we came up with before, and now let’s prioritize and move them into test suites. * Let’s give each test suite a maximum wait time before a failure is cheaper than waiting for it to pass.
This graph is not scientific! In my experience, the sweet spot is somewhere between 70 and 80% code coverage. Any more than that is a whole lot of time for not much return.
Line coverage just means a line was run,. Branch coverage includes proper handling of ifs and switches. It’s generally more diagnostic.
Lets you know what needs to be tested more – a 5% covered package versus a 50% covered package
Quanitify the output of your test efforts over time to management! They eat this stuff up (HA HA HA)