Skip to main content
  1. My Blog posts/

Random tests, a random choice ?

·1185 words·6 mins·
medium
Víctor (Bit-Man) Rodríguez
Author
Víctor (Bit-Man) Rodríguez
Algorithm Junkie, Data Structures lover, Open Source enthusiast

Random tests, a random choice ? #

As a child I used to make a lot of random choices about on what path to follow for walking or to solve some issue and more often that not it lead me not only to the unknown but also to places where I wasn’t supposed to arrive.

Reading about some tips for Unit Testing here and there I found myself trapped between two contradictory positions on testing about using random values and found myself in a revival of my childhood. The former guideline establishes that

When the boundary cases are covered, a simple way to improve test coverage further is to generate random parameters so that the tests can be executed with different input every time. while the latter is emphatic against its practice Another “practice” that must be avoided is writing tests with random input. Using randomized data in a unit test introduces uncertainty. When that test fails, it is impossible to reproduce because the test data changes each time it runs. I’m more in favor of using random values but let me explain my position here. To start with let me say that both positions make sense because to cover all test cases would be nice to cover all input values but this becomes impossible quickly. One idea should be to implement a sequential test space search in a way that after a lot of test runs there’s a faint probability that an obscure corner case shows up and enlightens our code. The problem is that making the test aware saving where it stopped testing to resume it’s execution later is a clear violation of the principle that no test must depend on other test outcome. Can be said that one test depending on itself is not a violation to this principle but if we think about test instances clearly the same test run with two different sets of inputs are two different test instances.

To avoid test dependency between test instances lets use random inputs, even this choice improves the namespace coverage. Using sequential namespace coverage makes the input equivalent classes traversal quite slow because each class is covered completely before going to the next one thus making error discovery only on move from one equivalent class to the next [1] (let’s say that one class element is representative enough for the whole class and upon using it proves the whole class is right or wrong). Instead using random inputs means that more than one equivalent class is tested on each test execution thus producing a higher coverage ratio (classes visited / test execution) [2].

Going with the article stating against using random values for testing let’s say that the uncertainty exists because there’s no way to obtain the same data again because it’s random (got it ?). The flaw about the last assertion is that the article speaks about unit testing and according to agile practices once a test fails it’s immediately fixed and is not left in oblivion waiting for a lack of failure the next time it runs. Showing the failure, the expected and obtained values should be enough to start a new equivalence class discovery, create a new test and fix that buggy code. Returning to my random choices as a child I wish to use some bread crumbs to signal the way back home and not to worry my parents the way I did.

Some math rant #

Regarding [1] let me demonstrate that the first part ( is not completely true unless you plan it carefully. Let’s take a sum(int, int) method where the class equivalence boundary is where numbers change sign, then there’s 4 equivalence classes 1) add positive number to positive number, 2) positive to negative, 3) negative to positive and 4) negative to negative). Let’s run the test doing a complete scan :

for ( int a = -10, a <= 10; a++ )  
for ( int b = -10, b <= 10; b++ )

assert a+b == sum(a,b)

in this test case equivalence classes are not completely exhausted before moving to the next one but alternating bit of 4) and a bit of 3) execution until exhaustion, then a bit of 1) and a bit of 2) until exhaustion. To do complete exhaustion before moving to the next one :

// Equivalence class 1  
for ( int a = 0, a <= 10; a++ )  
for ( int b = 0, b <= 10; b++ )

assert a+b == sum(a,b)


// Equivalence class 2  
for ( int a = 0, a <= 10; a++ )  
for ( int b = -10, b <= 0; b++ )

assert a+b == sum(a,b)


// Equivalence class 3  
for ( int a = -10, a <= 0; a++ )  
for ( int b = 0, b <= 10; b++ )

assert a+b == sum(a,b)


// Equivalence class 4  
for ( int a = -10, a <= 0; a++ )  
for ( int b = -10, b <= 0; b++ )

assert a+b == sum(a,b)

In the previous test case can be said that move to the next equivalence class is done only once the current one is exhausted then statement [1](“”) applies, but in the first case (alternated equivalence class execution) the previous statement also holds true because doesn’t matter that one equivalence class is exhausted but that a new one test starts because exercising just one equivalence class case and make it fail is enough to declare the complete equivalence class fails, then**[1]** holds true not because both parts are true (complete one and exercise the next) but because we can move from one equivalence class to another without exhaustion then testing just one case of the equivalence class is enough, thus identifying them and run one case test for each is enough to cover them all meaning that random testing is pointless because the same points of each equivalence class can be tested every time a test is run.

assert 12 == sum(8,4)
assert -4 == sum(-8,4)
assert 4 == sum(8,-4)
assert -12 == sum(-8,-4)

Not so. Doing some equivalence with bugs we do not know in advance how many equivalence classes exist and even when a code analysis is performed and equivalence classes are discovered cannot completely be sure that some hidden equivalence class exists or not, then executing one test case for each known equivalent class may led us to miss the hidden one(s). This is where random testing makes sense because selecting the test values at random there’s a greater than zero possibility of discovering the hidden equivalence classes providing that random values are taken from a constant density distribution function meaning that the probability to hit the tiniest equivalence class (being just one combination of input parameters) is :

1  
p = — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —       
    param 1 space size * param 2 space size *     … param N space size       

Please enjoy !