Robustness stress testing vs. traditional software testing - A change in perspective
Traditionally, software testing seeks to answer the question “Does the software work?” By contrast, software stress testing asks “Can the software do something unsafe?” This change in perspective alters the kinds of inputs issued to the software under test.
An astronomically large number of different inputs could be sent to a complex software system, so any testing activity will only explore a small portion of this space. For example, when on the highway, the vast majority of measurements made by software controlling an autonomous vehicle will depict standard weather and traffic conditions as well as typical behaviors of other drivers.
When asking “Does the software work?” a tester selects, constructs, or simulates inputs that might be encountered during normal operation. But “normal operation” typically involves very similar inputs.
However, unexpected conditions could present this controller with a much wider range of inputs. Why might such unexpected inputs be generated? Perhaps the environment confuses the sensor. Or, perhaps noise corrupts messages sent by the sensor. In some cases, the unexpected input could be caused by a bug in the sensor’s firmware – a condition that is particularly worrisome in distributed systems developed by many vendors.
These kinds of distributed systems count on independent developers adhering to design contracts which may be poorly executed, and often only vaguely specify proper responses to unexpected conditions.
This situation is worsened by the “vulnerability surfaces” of complex software systems. Not only are large-scale data involved – making input combinations impossible to fully explore – also implicit assumptions about the structure within those data.
These assumptions might not hold in the real world, and if only implied, are rarely explored with traditional testing.
As a very simple example, let’s return to our self-driving car software. It is likely to count on several time signals, for example, the time at which a radar sensor last sent a measurement. In “normal operation” this time signal will tick upwards at a steady rate.
But how will the controller behave when the time signal isn’t so regular? Asking this question opens up numerous potential time signals. Those that developers might imagine may be handled properly. Those that aren’t as predictable won’t be explored with traditional testing, but software stress-testing techniques have a good track record for uncovering unexpected combinations of inputs that break implicit assumptions, primarily because such techniques aren’t biased by how developers expect a signal to behave. Instead, they try combinations that are logically possible based on only on data types used at an interface.
Stress testing intentionally explores your software’s behavior in response to these kinds of inputs that may seem unlikely but that your software will eventually see when “out in the wild”. Even mature software tends to be vulnerable when subjected to stress testing. Results of our research at Carnegie Mellon show that a variety of systems, including commercially-available operating systems, widely-used open-source libraries, and controllers for automated vehicles all yield bugs when tested.
In addition to our testing tools, we mine for correlations between the structure of software tested and the bugs discovered. This educates our testing tools, allowing them to proactively structure tests using input combinations that are known to be dangerous for a particular class of software or type of interface.
All vulnerabilities that we find are demonstrable failure modes of your software and architecture. The only question is – how vulnerable should your software be? Our technologies and methodologies are focused on improving this cost-benefit tradeoff.