We have an
automation suites for our Apps that is set to run on every commit to
master/deploy to Prod and for a long, long time (almost right after the
beginning) we've been having issues trying to make it reliable enough.
The tests are run
in CI server (TeamCity) using Selenium WebDriver/Grid. We know
the tests work because if we run them locally on our laptops (I and the
team had tried it) they run perfectly every single time.
But when they fail
they don't always fail at the same spot. Sometimes it's a timeout while waiting
for an Web element, sometimes the test ends up in an error page that shouldn't have
reached in the first place and we have no idea how it got there... So yeah, it's
frustrating.
The team have
tried a lot of different approaches to debug it. Re-writing the setup of each
test to make sure everything is cleared up at the end of every single test so
that the next one starts with a clean workspace/cache, making it so Selenium
takes screenshots every time it fails to see what happened, tried different
versions of chromedriver/chrome/selenium, added heavy logging of each action taken, put the tests to run several times in
a row to see if there was any pattern...
The problem:
Unfortunately,
across our entire suites of tests, we see a continual rate of all test runs
reporting a "flaky" result. We define a "flaky" test result
as a test that exhibits both a passing and a failing result with the same code. Root causes why you are getting flaky results
are many: parallel execution, relying on non-deterministic or undefined
behavior, flaky 3rd party code, infrastructure problems, etc. Some of the tests
are flaky because of how test harnesses interact with the UI, sync timing
issues, handshaking, and extraction of SUT state.
Even if we have
invested a lot of effort in removing flakiness from tests, overall the
insertion rate is about the same as the fix rate. Meaning we are stuck with a
certain rate of tests that provide value, but occasionally produce a flaky
result.
Mitigation strategy:
In my opinion even
after tons of effort to reduce such problematic tests, flaky tests are
inevitable when the test conditions reach a certain complexity level. We will always have a core set of test
problems only discoverable in an integrated End-to-end system. And those tests
will be flaky. The main goal, then, is to appropriately manage those. I prefer
to rely more on repetition, statistics and runs that do not block the CI
pipeline.
Just tagging tests
as flaky is addressing the problem from the wrong direction, and it will lose
potentially valuable information of the root causes. However, I think that there are some actions that can help us keep the flaky tests at their
acceptable minimum. Consider introducing some of the below listed methods in
your own context. They are split based on implementation difficulty, so you can
plan your efforts accordingly:
[Easy]
- re-run only failed tests. Failed build should keep those tests, mark them and trigger second build to execute them.
- use combination of Exploratory testing and Automation runs. One of the basics for automation is to consider appropriate candidates (stable and are not changed too often).
- do NOT write many GUI System Tests - they should be rare, when needed. You need to build a pyramid. There are almost always possibilities to write tests at lower level.
- if you utilize parallel tests execution, consider moving some (few) tests into a single-threaded suite
- re-run tests automatically when they fail during test execution. You can read the test status in the TearDown and if failed, start new Process to execute the test again. Some open-source testing frameworks/tools also have annotations (e.g. Android has @FlakyTest, Jenkins has @RandomFail/ flaky-test-handler-plugin, Ruby ZenTest has Autotest and Spring has @Repeat) to label flaky tests that require a few reruns upon failure.
- quarantine section (separate suite/build job) that runs all new tests added in a loop for a certain amount of executions (Fitness function) to determine if there is any flakiness in them, in that time they are not yet part of the critical CI path. Execute reliability runs of all your CI tests per build to generate consistency rates. Using those numbers, push product teams to move all tests that fall below a certain consistency level out of the CI tests.
- consider advanced concepts like combination of xpath and Look&feel
- refactor for Hermetic pattern, avoid global/shared state or data and rely on random test run order
- proper Test Fixture strategy
- tool/process that monitors the flakiness rate of all tests and if the flakiness is too high, it automatically quarantines the test. Quarantining removes the test from the CI critical path and flags it for further investigation.
- tool/process that detects changes and works to identify the one that caused the test to change the level of flakiness
- test that monitors itself for what it does. If it fails, look at root cause from the available log info. Then, depending on what failed (for example, an external dependency), do a smart retry. Is the failure reproduced? Then, fail the test.
Conclusion:
I know all of the above is far from perfect or complete solution, but the truth is that you have to constantly invest in detecting, mitigating, tracking, and fixing test flakiness throughout your code base.
Really nice topics you had discussed above. I am much impressed. Thank you for providing this nice information here
ReplyDeleteSoftware Testing Company
Mobile Game Testing
Gameplay Testing
Switch Game Testing
thank u
ReplyDeletenetwork company in dubai
best IT networking company in dubai
Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.
ReplyDeleteVarious Stages of Game Testing Techniques you need to know
7 Essential Tips for Successful QA Implementation
Types of Game Testing Processes that need to be followed
How Game Testing differs from Software Testing
6 Challenges that every Game Tester Faces
9 Critical Bugs to be Identified in Game Testing process
Is the age of AAA gaming dying?
Major Mobile Game Testing Concerns for Testers
Game Testing Trends to watch out for in 2020
ReplyDeleteI must thank you for the efforts you have put in penning this site.
Selenium Training in Chennai | Certification | Online Courses
selenium training in chennai
selenium training in chennai
selenium online training in chennai
selenium training in bangalore
selenium training in hyderabad
selenium training in coimbatore
selenium online training
This blog will help to get more ideas. This is very helpful for Software Testing learners. Thank you for sharing this wonderful site. If someone wants to know about Software QA services this is the right place for you Software QA Companies.
ReplyDelete