Test Patterns In Java

Jaroslav Tulach
$Date: 2009/11/02 14:12:43 $ $Revision: 1.2 $
Latest version of this document can be found at
http://openide.netbeans.org/tutorial/test-patterns.html

Read more in the newly published book


Abstract

Testing is an important part of software development. Effective testing is a key factor in reducing total cost of maintenance of any application over its lifetime. It reduces the cost and time of development, can increase savings on quality assurance and of course on sustaining. Knowing when to invest in better design, in post development quality assurance, in manual tests or in automatic testing forms a basic difference between successful and unsuccessful software projects in these tough and competitive days.

In this paper we start with general motivation and present automatic tests as a form of functional specification. It will then quickly dive deep down into hardcore JUnit test examples showing various forms of regression tests verifying algorithm complexity, memory management, data structure sizes, deadlocks, race condition behavior and tests randomly generating new test cases, simulating user clicks in the UI, API signature tests, etc.

We assume that you know what tests are and that tests are important and useful part of software development. This presentation will show how it looks when it comes to practical usage of a test framework. You will learn techniques, tips and tricks for testing various aspects of real world J2SE applications. We will also give examples of the savings and improvements that were achieved by increased usage of extensive automatic testing in the development of NetBeans platform and IDE.

What does "quality of an application" mean?

There are many possible answers to the question "what makes an application have good quality?". Depending on one's standpoint the application can be requested to have a slick UI, natural work flow, be acceptably fast, not crash from time to time, etc. These are all good expectations, and let's include them under one general category - specification. If we are good UI designers and if we can understand the user needs, then we can create a good specification, which describes how our application should look.

What we think our app looks like

However this does not mean that our users will be satisfied with the quality of what they get. A good enough specification is just half of what they see. They also need good enough implementation. Any expectations we put into our application can or even are (in ideal state) expressed in our specification, but before they get to the user they have to be implemented in code and it is very likely that the code will not follow the specification fully. There will be differences between the code and the specification.

What our app actually looks like

The amount of the differences between our code and our specification is the measurement of quality. If the final application is not doing what we originally intended it to do, then it is not good enough. Its quality or maybe better term is our confidence in it is lowered with every difference from the expected behaviour.

Sometimes the application does not do what we expect it to do, sometimes it does more. Both situations are dangerous, but only one of them is easy to find. One can read thru the spec and test (manually, automatically) if everything that is requested is really implemented. So by carefully testing one's application, one can minimize the places where the code offers less than expected. But even this has its limits:

What our app will look like after the next release

Over the time, with new and new releases, regressions occur. The functionality of the code is changing, it starts to do new things and alas, it also stops to do what it used. Of course one can execute the manual test procedures once more with every release, but that is first of all very expensive as people have to try all specified features from all previous releases, and the accuracy of such findings often is not good enough. As a result the shape of the application code is changing from release to release as amoeba changes its shape over time. That is why we call this behaviour the amoeba model and in the rest of the paper we give advice for ways to fight with its unwanted implications.

Tests As A Form Of Specification

This paper claims that writing automated tests is beneficial for the quality of an application. But as we are not really XP programmers, we are not going to insist on tests being a must. We believe they are beneficial, especially in certain areas, but sometimes there are other useful and effective ways to achieve good enough quality.

One of the most important things automated tests do, is that they show the intended use and behaviour of the applications code. We all know how hard it is to maintain code inherited from another person - read it all, analyze what it does and if a bug is reported, just guess whether that is really a bug or some weird feature (e.g. intended behaviour). As a result, when fixing such code, one has to be afraid whether another hidden feature will not get broken. This whole suffering can be greatly simplified by automated tests. If written correctly, they contain the expected calls to the application code and the expected results. One can always check them to find out whether certain use of the code was anticipated or is just a side effect that happens to work.

Automated tests help to fight the amoeba behaviour. Nearly every change to the application code shakes the shape of what its code does, fixing something and breaking something else at once. By having a chance to execute the tests with an integration of a change in code, one does not prevent the breakages, but at least ensures that the anticipated intended behaviour of an application remains unchanged.

By running various coverage tools (NetBeans uses emma) one can find what part of the application code is covered and which is not. This allows us to find and delete code that is not needed or improve the automated coverage for parts of such code and receive more of the benefits associated with having automated tests.

An important social aspect of tests is their support for arrogance. Well, in fact it is not real arrogance, it is more effective use of manpower. By having automated and isolated tests, one can refuse bugs that others want to assign to him by finding or writing tests which mimic the buggy behaviour and proving that the behaviour of one's code is ok and the bug must be somewhere else. This provably lowers the number of hours spent with a debugger and thus more effectively uses time given to programmers.

On the other hand, there are areas which usually do not benefit much from attempts to provide automated tests. User interface is one of such examples. It changes often and automated tests (although possible - see jemmy.netbeans.org) are hard to write and are not fully reliable. Sometimes it might be more simple to invest in manual testing. But we have to mention that the NetBeans project has set of about fifteen UI tests to verify basic functionality of the NetBeans IDE and they are executed continuously and they helped us to catch various regressions, but often also just cause false alarms. In spite of all their drawbacks, they are still accepted as valuable.

From a general point of view, the automated tests can be seen as a kind of specification. Sometimes it is enough to write a UI specification or UML document - for certain aspects it is better to also provide automated tests coverage and sometimes it just does not make sense to do anything other than automated test. We'll give some examples later.

Pragmatism vs. Religion

When talking about testing one has to be very careful. People tend to have strong opinions and discussions easily get pretty heated. And not only heated, often also confused. The whole methodology is still evolving and things just have not settled yet. Due to that, let us dedicate a separate section to clear up possible definition confusion.

The interest in automated testing significantly increased during recent years and there is no doubt that this was caused due to all the efforts of the extreme programming movement. The influence is so big, that many people associate testing and XP together and feel that if you write tests you are doing XP or that if you do not want to do XP you should not write tests. This is a false impression. XP provides a lot of useful tools (like junit) and explicitly talks about the importance of writing unit tests by developers of the application code, but it is a much larger methodology which (according to the XP proponents) has to be followed fully or not at all. So even we propose usage of tests, whether we do XP or not XP is not the question that matters. The important thing is to get confidence in application code we produce.

Another misleading question that is often asked is whether we write unit or functional tests? Well, we use junit and xtest as base for writing and harness for running our tests, but that does not mean the tests are unit ones. As we will demonstrate later, sometimes it is useful to test the application code in isolation, sometimes it is more meaningful to setup as close to production environment as possible. This depends on the aspect we want to verify - in the first case we concentrate more on functionality of a small unit, in the latter we check how it behaves and cooperates with other components of the system. Both of these aspects may seem different, but we can use the same tools to check them (e.g. junit and xtest). Again, it does not matter whether we write functional or unit tests - what is important, is to force the amoeba's edge of our application to stop shaking.

Often people disagree when it comes to the question who should write the tests and when?. Some say that when the application code is written, it should be passed to the quality department which will write tests for it. Others (including XP folks) insist on tests being written together with the application code. We believe that both opinions have some merit. If an application is tested by someone who has not coded it, it will surely get tested in new and unexpected ways, and new and surprising areas where the behaviour of the application (its amoeba shape) does not match the specification will be discovered. Also, it does not make sense to write some kind of tests in advance (for example the deadlock test which we will discuss later) and it is better to wait for a bug to be reported and write the test to simulate the bug and as a proof that it really has been fixed. On the other hand, when tests are written by different folks, it can lead to the infamous we vs. they separation: People writing application code will not understand the tests and those who write the tests will not fully understand the application code and if a test fails these two sides start to blame each other while nobody will really know whether the bug is in test or in application code. This can get really frustrating especially if the failures are intermittent and unreproducible in the debugging environment. That is why we share the XP opinion that at least some tests shall be written by the application code developers. Beyond the fact that then there is now someone who understands both, the code and the tests, the other reason is that the application code is different if written with testability in mind.

The biggest improvement of the application code that happens when its developers also start to write the tests is that the application code will no longer be one big monolithic chunk of mutually interlinked code. Instead, for the sake of easy testability and also for the benefit of the application code design, its code gets split into smaller units that can exist separately. As such they can then be tested on their own. So instead of having to start the database, fill it with predefined data and then launch the whole application to just verify that an invoice entered by a user in a dialog is processed correctly, one separates the invoice handling code from external environment by abstractions. So the invoice code no longer talks directly to the database or requires a user interface. Instead of that, it relies on an abstract interface to handle its communication with the surrounding environment. In the test, then a fake artificial testing environment is created and which handles all the invoice processing calls, but without all the references to the rest of the application. This is called separation of concerns and there are various supporting techniques one can achieve that (see mock objects, dependency injection using spring and many others), but in NetBeans we are using Lookup. Our code is full of abstraction interfaces like:

  abstract class DialogDisplayer {
    public abstract void notify(String msg);
    
    public static DialogDisplayer getDefault() {
      return Lookup.getDefault().lookup(DialogDisplayer.class);
    }
  }

which separate the caller of its methods from the actual implementation. When one writes:

  DialogDisplayer.getDefault().notify("Hello world!");

it is unknown which implementation will actually get called when the code is executed. This allows us to provide different implementation in the whole NetBeans application (one that actually shows a dialog to the user) and a different one when testing the code, which can either print the message or record it for the rest of the test to check that the expected call really happened. This separation of concerns gives us confidence that our units of code are working correctly. It is true that this does not imply that the whole application is working correctly, but if there was no confidence in the smaller parts, there could be none in the whole application.

The separation of concerns is not the only change that happens when the same people who write the tests can also change the application code. Very often new, otherwise unneeded, methods that verify internal state appear. These usually stay package private as the tests are packaged in the same java package and are friends, so no externally visible changes are done, still the code gets more verifiable.

Another change is that the code gets more predictable. Until the only device that consumes output of the application code is the user, it does not matter much whether an action happens now or a few milliseconds later. The human will observe nothing (actually our usual style to solve problems was to post the action to SwingUtilities.invokeLater as the human would not realize the delay). This is a coding style that just cannot work with automated testing. The test is as quick as the computer and will observe any non-determinism the application code does. So the testing affects the code by either forcing it to remove such non-predictable constructs or by introducing rendezvous methods that allow the tests and application code to synchronize on the expected state and verify that the behaviour matches the expectations. This is indeed much better then the opposite style of postponing an operation to happen later and hoping that the human will not realize the delay, once again this helps the application to stiffen its amoeba edge.

Properties of Good Tests

The basic and probably only property of any test is that it has to be useful. As we are proposing tests as the right way to improve the quality of application code, it has to in some way contribute to its quality. It has to help us shape the amoeba in the way we want.

Regardless of any textual specification and talks made in corridors, sometimes it is not easy to find out what parts of the application code are supposed to do. Tests help with that. They show the intention. If the same programmer who codes the application also provides a test for its behaviour, it is clear that his expectations on input values, on order of calls, throwing exceptions, etc. are going to be visible in his tests. Later, when another person looks at the code, it can sort out more easily what works just by a chance and what is the essential and desired behaviour of the application code.

It is good if clever people develop your application and are able to fix bugs. It is however much better if your application is being maintained by careful people - e.g. those who care about the application's future a lot and rather invest more work now to prevent the same or similar problem to reappear later. Those that are just more careful and do not want to spend their life just by patching, patching, and patching broken code. In such an environment, every bug can be seen as a possible report that the behaviour of the application does not match our expectations. Such a bug is a chance to get the shaking amoeba edge of the application code closer to the specification. But only if one, together with the fix, also prevents regressions (by writing tests), the shape of the amoeba is going to be stiff enough to not regress and recreate the same bug in the future.

If we have enough tests and we want them to really prevent regressions, we need to run them regularly to discover their failures. Whether they shall be executed once per release, every week, every day or instantly depends on the size of your project and on the time the execution of tests consumes. This depends on how soon you want to catch the regression. The sooner it is discovered, the less painful is the identification of the change in application code that caused it. In NetBeans we have a set of tests, that take about five minutes and are supposed to be executed before every commit to our shared source base, we call them commit validation. Every developer is supposed to verify that his changes are not really bad and do not break some basic functionality of the NetBeans IDE. By warning that something is broken before it really gets integrated we minimize the amount of work needed to investigate what is causing the regression. However, these tests cannot include all of our tests that would run for hours, so we also have a suite of tests executed daily that is using the xtest harness to run the tests enumerated in our master configuration file and each module's individual configuration file. This ensures that every regression is reported within 24 hours after integration and that is why one needs to evaluate only the changes made in the previous day to find out the cause.

Of course, if people use tests written by someone else to verify whether their own work is correct (as NetBeans project does in commit validation), the tests must be reliable. Failure must mean that something with my code changes is wrong. If this is not true, then people start to loose trust in the whole system and as such, it is better to remove randomly failing tests from validation suites as quickly as possible.

Although XP suggests to avoid it, sometimes certain assumptions are made about the whole system and not just individual parts - e.g. source file. Such intentions are really hard to discover by reading single source files. This forms a maintenance problem since even if you have clever people that can read and understand the code, they may not realize that it is necessary to also locate some distinct source file and it as well. Only one change then and a regression appears! Of course if the assumption about the system is expressed in automated test, you will get notification. In some sense the tests can be used to link physically distinct sources that are logically connected together. This gets especially important when one does hacks around bugs in foreign code. Writing a hack is ugly, but if that is the only solution to some bug, there can be a strong push to do it. If it works, ok. But as usual with our amoeba, we want to be sure that the hack works with new versions of code we own. And as usual an automated test for the hack functionality is the cheapest way to ensure that. We used this hack verification test to check that our workaround for certain weaknesses in the default implementation of clipboard works. Not only did the test verify the behaviour on JDK 1.4, but it also caught a change in the clipboard handling in a beta version of JDK 1.5 and we could negotiate some changes with the JDK team to make it work again.

Indeed such widespread relation among unrelated parts of code have its problems, because the number of extra test dependencies greatly affects the likelihood that the test will be maintainable as other components change, that sudden test failures can quickly be diagnosed, and that someone changing code will know to run the right tests to check the changes before committing. In spite of these drawbacks it makes sense to use types of tests like this as they can test otherwise unverifiable assumptions - like whether two pieces of copied code behave the same - this will be discussed later in more details. However one has to be careful - the tests has to be executed often (in NetBeans all of them are run daily) and the failures has to be analyzed quickly. Otherwise, due to wide inter relations, it becames hard or even close to impossible to analyze the code change that caused the failure.

Anyway the only property of a good test is its usefulness. It can help us to show the intention, prevent regressions, verify connections of unrelated code or even hacks into foreign code and sometimes it can serve as an example (we have a shopping cart application with few checks around). Does not matter how, but if it helps to bring the amoeba's edge closer to our expectations, then it has to be good.

Testing Framework

When searching our memory we can realize that we wrote tests long, long time ago. The evidence consists of all the main methods in our application sources hidden in comments or sometimes even left in the code that setup the part of the application in a specific way so one can just verify his code and not be forced to launch the rest. This indicates that programmers really like to write tests and like the separation of interest, but still, for writing automated tests, it is better to use an existing framework than to encode the testing logic into main methods spread everywhere. And if we need testing framework in Java, then we very likely need junit. It is the flagship of the XP in Java, and even though it is not a masterpiece of art, it works well, is easy to use for simple cases and is highly customizable if one needs more complex behaviour.

Tests are written in separate classes, in the same packages and are usually named as the class they test with postfix Test:

   public class MyClassTest extends TestCase {
     private int variable;
     private static int counter;

     static {
       // one and only initializations
     }

     public MyTest (String name) {
       super (name);
     }

     protected void setUp () throws Exception {
       // do some kind of setup
       variable = counter++;
     }

     protected void tearDown () throws Exception {
       // clean up and possibly post verifications
     }

     // The tests

     public void testFirstInvocation () throws Exception {
       assertEquals ("First means 0", 0, variable);
     }

     public void testSecondInvocation () throws Exception {
       assertEquals ("Second is 1", 1, variable);
     }
   }

When executed by a harness, the static initialization is called once, then an instance of the test is created for each method with prefix test. The sequence of setUp, test method, tearDown is then called on each instance. The test is expected to perform some computations and then verify if the result is correct by using various predefined assertions like assertTrue, assertEquals, assertSame or unconditional fail. If all tests passed, the harness succeeds, otherwise it reports an error.

JUnit is simple and extensible. That is why the NetBeans team is using our own extension called nbjunit which, instead of TestCase, provides NbTestCase with a few additional useful assertions and other useful utilities. The nbjunit is absolutely independent on the rest of the NetBeans libraries, so it can be easily used in any java project. Here is the list of possible ways how to get the library:

  • The library can be easily used by anyone running NetBeans IDE, select Tools, Update Center in menu and download it. Then you can add the library to any of your project test classpath and start to use it.
  • The library has been uploaded to jpackage.org repository, so if you are on correctly configured operating system, just use urpmi nbjunit or apt-get install nbjunit and the NetBeans JUnit Extensions and Insane Library will be downloaded and installed.
  • Or you can download the sources for the library from xtest.netbeans.org and build it ownself.

However a typical NetBeans test not only uses NbTestCase, but also Lookup to handle the separation of concerns. This is easily configurable by use of MockServices support class:

public class MyTest extends NbTestCase {
    public MyTest(String name) {
        super(name);
    }
    protected void setUp() throws Exception {
        super.setUp();
        org.netbeans.junit.MockServices.setServices(DD.class);
    }
    public static class DD extends DialogDisplayer {...}
}

For more sophisticated cases you can do this manually, so the static one-time-only initializer registers the test's own implementation of the Lookup providing the environment needed for the test:

public class MyTest extends NbTestCase {
 static {
   System.setProperty("org.openide.util.Lookup", Lkp.class.getName());
 }

 public MyTest(String name) {
   super(name);
 }

 public static final class Lkp extends org.openide.util.lookup.AbstractLookup {
    public Lkp() {
        this(new org.openide.util.lookup.InstanceContent());
    }

    private Lkp(org.openide.util.lookup.InstanceContent ic) {
        super(ic);
        ic.add(new DD());
        ic.add(...);
    }
  }
}

Of course junit is flexible enough, so one can use other extensions, and possibly also a different harness (recently there was a lot of buzz around TestNG), but as our example are going to use the NbTestCase, we thought it is reasonable to explain our specific terminology.

When there is enough tests

While writing tests, people can ask: how many of them should be written? The simple answer is to write tests while they are useful. The more precise, more complex and less clear answer is going to be covered in this chapter.

There are various tools out there that help to measure test coverage. We have selected emma for measuring the coverage of our application code by our tests. When invoked (for example from the popup menu of any project from NetBeans.org) it instruments the application code and invokes automated tests on it. While running, it collects information about all called methods, visited classes and lines and then it shows a summary in a web browser.

Counting coverage by visited methods is very rough criteria, but it can be surprisingly hard to get close to 100%. But even if you succeed, there is no guarantee that the resulting application code works correctly. Every methods has a few input parameters, and knowing that it succeeded once with one selection of them, does not say anything about the other cases.

Much better is to count the coverage by branches or lines. When there is a if (...) { x(); } else { y(); } statement in code of your method, you want to be sure that both methods, x and y will be called. The emma tool supports this and by helping us to be sure that every line is visited, it gives us confidence that our application code does not contain useless lines.

Still, the fact that a line is visited once, does not mean that our application code is not buggy.

  private sum = 10;  
  public int add(int x) {
    sum += x;
  }
  public int percentage(int howMuch) {
    return 100 * howMuch / sum;
  }

It is good if both methods get executed, and fine if we test them with various parameters - still we can get an error if we call add (-10); percentage(5), because the sum will be zero and division by zero is forbidden. To be sure that our application is not vulnerable to problems like this, we would have to test each method in each possible state of memory it depends on (e.g. each value of sum variable) and that would give us the ultimate proof that our application code works correctly in a single threaded environment.

But there is another problem - Java is not single threaded. A lot of applications start new threads by themselves, and even if they do not, there is the AWT event dispatch thread, the finalizer thread, etc. So one has to count on some amount of non-determinism. Sometimes the garbage collector just kicks in and removes some unneeded objects from memory, which can change the behaviour of the application - we used to have a never ending loop, which could be simulated only if two mozilla browsers and an evolution client was running as then the memory was small enough to invoke the garbage collector. This kind of coverage is unmeasurable.

That is why we suggest people to use code coverage tools as a way to sanity check that something is not really under tested. But it is necessary to remind ourselves that however high the coverage is, it does not prevent our application code fully from having bugs. So we, in order to help to fight the strange moves of an application amoeba shape, suggest to write a test when something gets broken - when there is a bug report, write a test to verify it and prevent regressions. That way the coverage is going to be focused on the code where it matters - the one that really was broken.

AWT Testing

User Interface automated testing is hard; especially in Java that runs on a variety of platforms, each with slightly different behaviour - sometimes keyboard focus follows the mouse, sometimes it does not. Swing comes with different look and feels and our automated tests, in order to be useful, has to overcome this and work reliably. And this is hard to achieve.

Rule #1 is clear: Avoid AWT testing. Separate your application code into two parts - application logic and actual presentation. Test your models separately from the UI - writing automated tests for them shall be possible, as they are the models and are independent from the presentation (of course if the design follows model-view-controller separation, as Swing does). For example instead of trying to write a test for your checkbox, write a test for its ToggleButtonModel, make sure it works and then let the UI simply delegate to that model (this is what we did when writing StoreGroup and its tests). That gives you confidence in your code as the logic is tested and the UI is simple enough that a manual test once a release is enough to guarantee it works as it should.

Sometimes it is possible to avoid showing UI components, but one has to run the test inside of the AWT dispatch thread, because the code follows the Swing threading model (e.g. everything in AWT thread). The simplest library independent approach for this is to use invokeAndWait:

    public void testSomething () throws Exception {
      javax.swing.SwingUtilities.invokeAndWait (new Runnable () {
        public void run () {
          callMethodThatDoesTheTest ();
        }
      });
    }

The logic of the test method is in callMethodThatDoesTheTest and it is executed in AWT. The above example however does not handle exceptions correctly and if we try to, the code gets a bit more complicated (see run method in NbTestCase) and that is why we allow tests to just override runInEQ and the NbTestCase handles the rest for them automatically:

    public class MyTest extends NbTestCase {
      protected boolean runInEQ () {
        return true;
      }
    }

See for example CookieActionTest.java that needs to run in the AWT event thread as it tests Swing-like action implementation.

In some cases, the test code itself runs outside of the AWT event queue, but somewhere inside the application code certain actions are posted to AWT. In such case it may be handy (and often necessary) to wait for the execution to finish, before continuing the test. Then the following method may be useful:

    private void waitEQ () throws Exception {
      javax.swing.SwingUtilities.invokeAndWait (new Runnable () { public void run () {} });
    }

It posts an empty Runnable into the AWT event thread and waits for it to finish. As the queue of runnables is FIFO, the runnable is scheduled at the end after all tasks posted by the application and when it is finally executed one can be sure that all delayed tasks of the application in the AWT event queue are over as well. See DataEditorSupportTest.java for an example of a test that needs to wait while the application code finishes some actions posted to the AWT event thread.

There are situations when the generic mock object approach can be useful for AWT testing as well. For example in order to test UI environments that do not support custom cursor definition the UtilitiesTest defined its own AWT Toolkit that does not support custom cursors. In another example (see CloneableEditorUserQuestionTest.java another mock object for DialogDescriptor is used to fake the communication with the user. It replaces the DialogDescriptor, which in a production environment shows a dialog and interacts with a user by showing a UI component with a headless implementation that returns immediately preset values and thus allows automated verification of application code that itself communicates with humans.

If you cannot split your code into logic and UI and you absolutely have to write automated tests, then use Jemmy. It is a junit extension that operates on realized UI components and allows automated navigation on dialogs. An excellent introduction to jemmy can be found in the Jemmy Testing Toolkit presentation. For NetBeans UI testing there is an extension of Jemmy which is called Jellytools. Also you can watch a flash demo showing usage of testing tools in NetBeans. It is available at the automated testing tools overview page.

Algorithm Complexity Tests

A very important but fragile piece of functionality that really deserves automated testing is performance. Nearly everyone finds out that something seems to be slow in the application. The natural resolution is to get the profiler, run it on the application, find out what is wrong and fix it. This works, and we all do it, but the question is how effective is this? As far as we know the profiling is usually done when the application code is ready and all features are implemented. When performance improvements are applied at that time, it is clear that they will stay till the release (as all other integrations are already done). That means that we shaped our amoeba for the release, but what will happen by the next one, will the amoeba not change shape again?

Of course it will! After the release (and all the profiling effort) a new round of feature integration starts and for a certain point in time nobody will care about the performance. The new code surely changes expectations of the old one and very likely negates the performance improvements made in the hectic part during the end of the previous release. So it is the time to take the profiler, find what is wrong, provide improvements that make the application state acceptable enough and start the whole vicious circle again. Are you surprised? You should not be, it really works this way. Is there a better way? Yes, let's analyze it.

During the hectic profiling time, when one uses the profiler to find hotspots and provide speed ups to them, one invests a bit of time to write a speed test to demonstrate what is wrong. That test will first of all serve as a proof that the performance problem has been fixed, but (most importantly) also as a continuous reminder that the intention of this code is to be fast and as a warning that some day this intention was broken. This will prevent the amoeba to regress and save a lot of work when the new release is being profiled.

The basic idea behind a speed test is simple. Just execute the same algorithm on a data set of different sizes and confirm that the time matches our expectations (e.g. it is constant, linear, quadratic, etc.). This can easily be written in any test harness, including plain junit. The following test is trying to access the middle element of a linked list and checks that the access time is constant - well it checks whether the slowest time is three times slower than the first one:

   public class WhyIsTheAccessToListSlowTest extends TestCase {
     private int size;
     private List toCheck;
     private long time;
     private static long one = -1;

     protected void setUp () {
       size = Integer.valueOf (getName().substring (4)).intValue();
       toCheck = new LinkedList (Collections.nCopy ("Ahoj", size));
       time = System.currentTimeMillis();
     }

     protected void tearDown () {
        long t = System.currentTimeMillis() - time;
        if (one == -1) {
          one = t;
        } else {
          if (t > one * 3) {
            fail ("The time is just too long");
          }
     }

     private void doTest () {
       for (int i = 0; i < 10000; i++) {
         toCheck.get (size / 2);
       }
     }

     public void test10 () { doTest (); }
     public void test100 () { doTest (); }
     public void test1000 () { doTest (); }
     public void test10000 () { doTest (); }

This works and really can discover that access to the middle of the linked list is too slow, but such measurements can be influenced by various external events: first a garbage collector can be invoked during one of the tests and effectively stop the execution, making the time measurement too big. Or, a hotspot compiler can step in and decide to compile the application code to make it faster. As a result one of the tests will just take longer time as the hotspot compilation will slow it down, and the later tests executed after it will be faster as they will run the compiled code, much faster than the interpreted one. We have really observed such random failures and that is why we created NbTestCase.speedSuite a junit like wrapper around our test cases that can execute the tests more times to eliminate the influence of the garbage collector and hotspot compiler. The results are excellent, just by allowing the test to restart itself a few times in case of failure the indetermistic factors of external environment were eliminated and we had no random failures in our speed tests for more than a year. Here is the previous test rewritten in our speed suite style:

   public class WhyIsTheAccessToListSlowTest extends NbTestCase {
     private int size;
     private List toCheck;

     public static NbTestSuite suite () {
         return NbTestSuite.speedSuite (
            WhyIsTheAccessToListSlowTest.class, /* what tests to run */
            10 /* ten times slower */, 
            3 /* try three times if it fails */
        );
     }

     protected void setUp () {
       size = getTestNumber ();
       toCheck = new LinkedList (Collections.nCopy ("Ahoj", size));
     }

     private void doTest () {
       for (int i = 0; i < 10000; i++) {
         toCheck.get (size / 2);
       }
     }

     public void test10 () { doTest (); }
     public void test100 () { doTest (); }
     public void test1000 () { doTest (); }
     public void test10000 () { doTest (); }

An example of such test from NetBeans code base can be found for example at DataShadowSlowness39981Test.java and we can confirm that it helped us prevent regressions in our application amoeba edge.

Memory Allocation Tests

One of the least specified things in conventional project documentation is the memory model of your application. It is not that surprising given the fact that there is no generally known methodology for how to design the memory model of an application. However it is very surprising as writing a long running application without ensuring that you manage your memory effectively and do not allocate more and properly deallocate what is not needed, one can hardly write an application that is supposed to run for days.

Again, the classical model is to code the application, start the profiler and search for possible memory leaks. When found, return back to the code, fix it and on and on and on, until the application is in a releasable shape. And again, as described many times, this is not effective if one plans to release more than one version of the application. As after a time all the improvements from the profiler hunting phase which helped to ensure the amoeba shape is better will regress. Unless we make sure they are continuously tested.

The standard junit does not offer much in this area, so we have to write a few extensions for our NbTestCase both based on, or supported by, our memory inspection library called Insane.

The first thing one has to fight against regarding memory management in modern object oriented languages like Java are memory leaks. The problem is not that an application would address an unknown area in memory - that is not possible due to garbage collector - but sometimes (also due to garbage collector) objects that one would like to disappear, still remain in memory. If a certain operation always leaves garbage after its execution, after a few executions one can find, that the free memory is shrinking and shrinking and the whole application is slower and slower. For this the NbTestCase offers the method assertGC:

  Object obj = ...;
  WeakReference ref = new WeakReference (obj);
  obj = null;
  assertGC ("The object can be released", ref);

If you believe that after some operation an object shall be no longer needed in memory, you just create a WeakReference to it, clear your reference to the object and ask the assertGC to try to release the object from memory. The assertGC tries hard to force garbage collection of the object, it does a few System.gc, allocates some memory, explicitly invokes finalizers and if the WeakReference is cleared, successfully returns. If not, it invokes the insane library and asks it to find a reference chain that keeps the object in memory. Possible failure could then look like:

junit.framework.AssertionFailedError: Represented object shall disappear as well:
private static final java.lang.ref.ReferenceQueue org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef.QUEUE->
java.lang.ref.ReferenceQueue@17ace8d->
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef@bd3b2d->
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef@14653a3->
java.lang.String@130c132
        at org.netbeans.junit.NbTestCase.assertGC(NbTestCase.java:900)
        at org.netbeans.modules.adaptable.SingletonizerTest.doFiringOfChanges(SingletonizerTest.java:177)
        at org.netbeans.modules.adaptable.SingletonizerTest.testFiringOfChangesOnAllObjects(SingletonizerTest.java:116)
        at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:135)
        at org.netbeans.junit.NbTestCase.run(NbTestCase.java:122)

which can be read as there is a static field QUEUE which points to ReferenceQueue and through two AdaptableRef it holds the String which we wanted to garbage collect in memory.

Another thing that may affect performance of an application code is the size of its data structures. If you known that a certain object is going to be simultaneously kept in memory in thousands of instances, you do not want it to occupy 1000 bytes or more. You want to minimize its size. Again, this can be observed in profiling, or this can be a well-in-advance thought-out decision, but the usual problem remains - we need to ensure that from release to release the size constraint will not regress. For that our NbTestCase provides assertSize check:

 class Data {
   int value;
 }
 Object measure = new Data();
 assertSize ("The object is small", 16, measure);

It uses the insane library to traverse the graph of all objects referenced from the measure variable and computes the amount of occupied memory. Then it compares the value with the expected one and if lower or equal, it passes. Otherwise it fails, printing sizes of individual elements to let the programmer analyze the failure:

junit.framework.AssertionFailedError: Instance is small: leak 8 bytes  over limit of 64 bytes
  org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableImpl: 1, 24B
  org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef: 1, 32B
  $Proxy0: 1, 16B
        at org.netbeans.junit.NbTestCase.assertSize(NbTestCase.java:937)
        at org.netbeans.modules.adaptable.SingletonizerTest.testProvidesImplementationOfRunnable(SingletonizerTest.java:58)
        at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:135)
        at org.netbeans.junit.NbTestCase.run(NbTestCase.java:122)

So it can be seen that the AdaptableImpl references AdaptableRef and Proxy which together with their fields consume 72 bytes, which is more than the expected 64.

The size of simple java.lang.Object instance which has no fields is 8 bytes. When adding one field with integer or reference to other object the size increases to 16 bytes. When adding a second such field, the size stays at 16 bytes. The third one however increases it to 24 bytes. From that it seems that it makes sense to round up the number of fields in an object to two, and we are sometimes, in really sensitive places, really doing that. However it has to be noted that the computed sizes are logical ones, the actual amount of occupied memory depends on the implementation of the virtual machine and can be different, but that shall be ok, as the test of logical size expresses the intention of the programmer which is independent on the actual virtual machine architecture.

We have found both assertGC and assertSize very valuable in stiffening the edge of our application's amoeba. By writing tests using these asserts we can specify the expected behaviour of our application. So these tests became part of our functional specification, and not only that, being automated tests, they are an active specification that verifies its validity every time we execute them.

Randomized Tests

Most of the ways to test the application code we have discussed are useful when you find out that something is wrong and you want your fix to last and stiffen the amoeba shape closer to the desired look of the application. Tests usually do not help much in discovering differences between the specification and reality. The one exception however are randomized tests - they help to test the code in new, unusual ways and thus can discover new and unusual problems of the application code.

The basic idea is simple, just use a random number generator to drive what your test does. If, for example, if you support operations add and remove use the generator to randomly specify their order and parameters:

  Random random = new Random ();
  int count = random.nextInt (10000);
  for (int i = 0; i < count; i++) {
    boolean add = random.nextBoolean ();
    if (add) {
       list.add (random.nextInt (list.size (), new Integer (random.nextInt (100)));
    } else {
       list.remove (random.nextInt (list.size ());
    }
  }

This will not invent new ways to call your code, just new orders of calls that can reveal surprising problems, because not all combinations of operations have to be anticipated by the programmer and some of them may lead to failures.

An important feature of each test is its reproducibility or at least clear failure report. It is fine that we know there is a bug in our code, but if we do not know how to reproduce it, we may not be able to analyze and fix it. The reproducibility of random tests is even more important, as in fact, we do not know the sequence of computations that is really being performed. The first step to achieve it is to not create a random generator blindly, but use an initial seed which, when passed repeatedly generates the same sequence of numbers. If you look at the implementation of the default Random constructor, you will find out that the initial seed is set to the current time, so we can mimic the behaviour by writing:

  private void doRandomOperations (long seed) throws Throwable {
    Random random = new Random (seed);
    try {
        // do the random operations
    } catch (AssertionFailedError err) {
        AssertionFailedError ne = new AssertionFailedError (
            "For seed: " + seed + " was: " + err.getMessage ()
        );
        throw ne.initCause (err);
    }
  }
  public void testSomeNewRandomScenarioToIncreaseCoverage () throws Throwable {
    doRandomOperations (System.currentTimeMillis ());
  }

which knows the initial seed and prints as part of the failure message if the test fails. In such case we can then increase the coverage by by adding methods like

    public void testThisUsedToFailOnThursday () throws Exception {
      doRandomOperations (105730909304L);
    }

which exactly repeats the sets of operations that once lead to a failure. We used this approach for example in AbstractMutableLazyListHid.doRandomTest.

One problem when debugging such randomized test is that by specifying one number, a long sequence of operations is defined. It is not easy for most people to imagine the sequence by just looking at the number and that is why it can be useful to provide better output so instead of calling doRandomOperations (105730909304L); one can create a test that exactly shows what is happening in it. To achieve this we can modify the testing code not only to execute the random steps, but also generate a more usable error message for potential failure:

  private void doRandomOperations (long seed) throws Throwable {
    Random random = new Random (seed);
    StringBuffer failure = new StringBuffer ();
    try {
      int count = random.nextInt (10000);
      for (int i = 0; i < count; i++) {
        boolean add = random.nextBoolean ();
        if (add) {
          int index = random.nextInt (list.size ();
          Object add = new Integer (random.nextInt (100));
          list.add (index, add);
          
          failure.append ("  list.add(" + index + ", new Integer (" + add + "));\n");
        } else {
          int index = random.nextInt (list.size ();
          list.remove (index);
          
          failure.append ("  list.remove(" + index + ");\n");
          
        }
      }
    } catch (AssertionFailedError err) {
        AssertionFailedError ne = new AssertionFailedError (
            "For seed: " + seed + " was: " + err.getMessage () + " with operations:\n" + failure
        );
        throw ne.initCause (err);
    }
  }

which in case of error will generate human readable code for the failure like:

  list.add(0, new Integer (30));
  list.add(0, new Integer (11));
  list.add(1, new Integer (93));
  list.remove (0);
  list.add(1, new Integer (34));

We used this technique to generate for example AbstractMutableFailure1Hid.java which is long, but more readable and debuggable than one seed number.

The randomized tests not only help us to prevent regressions in the amoeba shape of our application by allowing us to specify the failing seeds, but also (which is a unique functionality in the testing), it can help to discover new areas where our shape does not match our expectations.

Reusing Tests

The junit framework provides a lot of freedom and allows any of its users to customize it in nearly unrestricted ways (so we could create the NbTestCase). The standard way of writing tests by prefixing the method name with test does not need to be followed and one can create its own style. Sometimes that is useful, sometimes necessary, but often the built-in standard is enough, because it is well thought out and offers a lot. Even multiple reuse of one test in different configurations is possible.

The simplest way of reusing a test is to let it call a protected factory method in one class and override it in a subclass with a different implementation:

    public class WhyIsTheAccessToListSlowTest extends NbTestCase {
      private List toCheck;

      // blabla

      protected void setUp () {
        size = Integer.valueOf (s.substring (5));
        //
        // calls the factory method to create the actual
        // instance of the list
        toCheck = createList (size);
      }

      // here comes the testing methods
      // imagine some that use the toCheck field initialized in setUp

      
      /** The factory method with default implementation.
      */
      protected List createList (int s) {
         return new LinkedList (Collections.nCopies (s, "Ahoj"));
      }
    }

    public class WhyIsArrayListFastTest extends WhyIsTheAccessToListSlowTest {
      protected List createList (int s) {
         return new ArrayList (Collections.nCopies (s, "Ahoj"));
      }
    }

This example creates two sets of tests, both running over a List but in both cases configured differently. This could also be done in a more advanced way by usage of factories and manual test creation as we did in AbstractLookupBaseHid.java, AbstractLookupTest.java, ProxyLookupTest.java, but for a lot of cases, the built-in inheritance in junit is enough. If used one can easily get twice as many tests, covering twice as many scenarios.

Writing one test and using it for more configurations can be very useful when one has a family of various implementations of the same interface that can be assembled by the final user into various increasingly complicated configurations. Imagine for example that one writes an implementation of java.io.InputStream and provides a test to verify that it works correctly. But in real situations, the stream is not going to be used directly, it will be wrapped by java.io.FilterInputStream or its subclass. That means we want to write another layer of test that executes the same operations as the previous test, but on our steam wrapped with FilterInputStream and yet another by a FilterInputStream with overridden methods. If these tests work, we will have more confidence that our implementation will really work on various configurations. But then we realize that usually the stream will also be wrapped with java.io.BufferedInputStream for performance reasons. Well, that is easy, to ensure that everything will work smoothly, we just create a new layer and configure the test to use BufferedInputStream.

We have used this technique in our implementation of javax.swing.ListModel with a very special semantic, that itself is pretty complex (LazyListModelTest.java) and required a lot of testing, but then it needs to be propagated thru various layers thru our APIs with unchanged semantics, so we added three additional layers (LazyVisualizerTest.java, LazyVisualizerOverLazyChildrenKeysTest.java, LazyVisualizerOverLazyChildrenKeysAndFilterNodeTest.java). Whenever there was a failure, we could immediately find out which part of our code needs fixing. If all four tests failed, then the problem was in the basic algorithm, if the basic test passed, and one of the more complicated did not, we immediately know which layer in our code needs to be fixed. This was very handy for debugging. One did not need to step through behaviour of all tested code, it was enough to just pay attention to code in the problematic layer.

This kind of setup has also been found very valuable for randomized tests. Whenever there was a failure, we recorded the seed and created a fixed test repeating the behaviour of the failed random test. And again, if all four suites failed, we know that the basic algorithm is bad, if some of them worked, we knew which part of the application to investigate to find the problem.

A surprising place where the layered style tests can be helpful is the eternal fight between people who want more reuse of the code and people that are afraid to allow it. We usually require at least three different uses of a certain functionality before we even consider to change it and maintain it as an API. The reason is that the reuse is associated with significant additional costs. One needs to find a reasonable generalization that suits all the known uses and one has to be more careful when developing such publicly shared parts (e.g. write more tests), otherwise things can quickly get broken and the whole benefit of code reuse is gone. That is why we sometimes suggest to copy the code instead. We know that copy based programming is not nice and that is has its own problems, but sometimes that is really more convenient. Especially when its biggest problem - i.e. the possibility that the copied code gets out of sync - can be easily overcome by a test. Supposed that the original code is well tested, then it is not hard, when copying the code to modify its test to use factory for creation of the tested object and together with the code create also a test that changes the setup to create an object matching the new scenario and executes all the old tests on it. We used this in TopComponentGetLookupTest which sets up its environment in setUp method and defines a set of tests that shall pass. The test is then extended by different module's ExplorerUtilCreateLookupTest. This effectively prevents the code from getting out of sync. If any future fix in the original code changes the behaviour or adds a new feature covered with tests then the next run of automated tests will fail on the copied code. As the inherited test executes the same operations on the currently out-of-sync copied code as we will be properly notified to update our copy of the application code.

More layers of the same tests is a very valuable and powerful pattern that not only helps to form the amoeba into a more desirable shape, but also effectively uses the invested work into writing one test, as it exploits its power in more situations and helps to more quickly analyze the area that caused the test fail.

Testing Foreign Code

Whenever one designs an interface that others can implement, one exposes himself to possible problems caused by wrong implementations. It is very likely that at least one implementor will not do everything correctly and something will go wrong. We have faced that with our generic wrapper virtual file system api. It provides a generic API to access resources of regular operating system files, in ZIP and JAR archives, version controlled files, ftp archives, and many more. Clients work with just the API and if there is a bug it ends up reported against the generic framework, regardless of the actual implementation that is often responsible for the faulty behaviour. To prevent this, another step of layered tests can be used as a very good solution.

The provider of the API that allows other implementors to plug-in, can write a generic set of tests that describe the properties that each implementation shall have. These tests, contain an abstract factory interface that implementors shall provide together with their implementation of the application code. Their factory sets up the environment for their code to work and the rest of the test then verifies on the pre-setup object that all the required properties are satisfied. This is often referred to as TCK - test compatibility kit. In a way, a layered test, but with a generic interface, not knowing all the implementors that are going to reuse it.

The already mentioned virtual file system library has such TCK in which the heart is formed by a factory class with one create and one cleanup method:

   public abstract class FileSystemFactoryHid extends NbTestSetup {
     protected abstract FileSystem[] createFileSystem(String testName, String[] resources) throws IOException;        
     protected abstract void destroyFileSystem(String testName) throws IOException;   
   }

All the tests use these methods to get the FileSystem to operate on and the various implementations provide their implementation and select the test sets that shall be executed. So the accesses to ZIP and JAR resources can do:

   public class JarFileSystemTest extends FileSystemFactoryHid {
    public static Test suite() {
        NbTestSuite suite = new NbTestSuite();
        suite.addTestSuite(RepositoryTestHid.class);                
        suite.addTestSuite(FileSystemTestHid.class);        
        suite.addTestSuite(FileObjectTestHid.class);
        suite.addTestSuite(URLMapperTestHidden.class);
        suite.addTestSuite(URLMapperTestInternalHidden.class);
        suite.addTestSuite(FileUtilTestHidden.class);                        
        return new JarFileSystemTest(suite);
    }
    protected void destroyFileSystem (String testName) throws IOException {}
    protected FileSystem[] createFileSystem (String testName, String[] resources) throws IOException{
      return new JarFileSystem (createJarFile (test, resources));
    }
   }

Other types of file system plugins do similar things, just create different instances of the filesystem.

As usually this helps to fight with an application amoeba. Moreover Test Compatibility Kit helps to improve regular activities a software engineering organization needs to solve. For example it simplifies the lifecycle of a bug report, it makes it much easier to find out which part of the system is buggy and lowers the number of reassignments that a bug needs to find the right owner. We really lowered the number of assigning to you for evaluation bug transfers by introducing TCKs. In some sense a TCK supports the best incarnation of programmer's arrogance - until you have an implementation, also use the TCK or your bug reports will not be taken seriously.

Testing Complete Rewrite Compatibility

In interesting example when one can use the TCK approach for own good is case of complete rewrite of some class. Imagine that there are maintanence problems with one class. Its implementation is not good, it is buggy and it also need to be improved to handle a bit more. Nobody believes that it is possible or reasonable to enhance the existing code, it is better to throw it away and write from scratch. However there is a concern that by doing the rewrite the functionality of the class might change and thus affect everyone who is using it. Imagine that the situation gets even more complicated as the original author of the code just did not know how tests are important and the class is heavily undertested. What shall one do?

A possible and highly recommended technique is to create a test that is going to compare the behaviour of two implementations and very that they do behave the same with respect to what is being tested. The first step is to move the old implementation from the application code to the test code. So instead of being deleted the old unmaintainable class is going to become the template for the expected behaviour of the new code.

The testing code is then going to use a facade - an abstraction layer so instead of calling the methods on the class itself, it is going to call methods on some abstract interface, just like tests written for TCK would do. The actual implementation of the interface is then going to either call the old code or the new code. If the test passes in both of these setups then it can be believed that the functionality of the old implementation has been preserved well enough to match the behaviour of the old implementation to the known extent. Btw. it should be noted that this is a very suitable situation for using the randomized tests, because they can just generate a random sequence of calls and apply them to both of the implementation at once and verify that they both produce the same results.

NetBeans used this approach for example during an attempt to rewrite CookieSet to new and enhanced implementation. The old implementation has been copied to the test, called OldCookieSetFromFebruary2005 and the test could compare the behaviour of the actual implementation of the class from Feb 2005 and confirm or reject that the new code behaves the same. This turned into very useful verification as it lead us to postpone the integration of changes into CookieSet as we were not able to reach the full compatibility easily at all.

Deadlock Test

Fighting with deadlocks is a sad destiny of any multithreaded application. The problem field has been under extensive research because it causes huge problems for every writer of an operating system. Most of the applications are not as complex as operating systems, but as soon as you allow foreign code to run in your application, you basically have to fight with the same set of problems.

In spite of the huge research efforts, there was no simple answer / solution found. We know that there are four necessary and also sufficient conditions for a deadlock to be created:

  1. Mutual exclusion condition - there has to be a resource (lock, execution queue, etc.) that can be owned by just one thread
  2. Non-preemptive scheduling condition - it is not possible to take away or release a resource already assigned by anyone else than its owner
  3. Hold and wait condition - a thread can wait for a resource indefinitely and can hold it indefinitely
  4. Resources can be acquired incrementally - one can ask for a new resource (lock, execution queue), while already holding another one

But we do not know how the code that prevents at least one condition to appear shall look like and definitely we do not know how to do a static analysis over source code to check whether a deadlock can or cannot appear.

The basic and in fact very promising advice for a programmer in a language with threads and locks like Java has is: do not hold any lock while calling foreign code. By following this rule one eliminates the fourth condition and as all four must be satisfied to create a deadlock, we may believe we found the ultimate solution to deadlocks. But in fact, it is sometimes very hard to satisfy such restriction. Can the following code deadlock?

  private HashSet allCreated = new HashSet ();

  public synchronized JLabel createLabel () {
    JLabel l = new JLabel ();
    allCreated.add (l);
    return l;
  }

It feels safe as the only real call is to HashSet.add and HashSet does not use synchronized at all. But in fact there is a lot of room for failures. The first problem is that JLabel extends JComponent and somewhere in its constructor one acquires the awt tree lock (JComponent.getTreeLock()). And if someone writes a component that overrides:

    public Dimension getPreferredSize () {
      JLabel sampleLabel = createLabel ();
      return sampleLabel.getPreferredSize ();
    }

we are in danger of deadlock as the getPreferredSize is often called when a component is painted and while the awt tree lock is held. So even though we tried really hard to not call foreign code, we did it. The second and even less visible problem is the implementation of HashSet. It uses Object.hashCode() and Object.equals which again can call virtually anywhere (any object can override them) and if the implementation acquires another lock, we can get into similar, but even less expected, problems.

Talking about possible solutions for deadlocks would provide enough materials for its own article, so let us return back to topic of this one - writing tests.

In Java, the solution to deadlocks are often easy. Whenever the application freezes, the user can produce a thread dump and from that we can get the description of the problem. From there, there is just a step to fix, just lock on another lock, or use SwingUtilities.invokeLater to reschedule the code in question from the dangerous section sometime later. We used this style for a few years and the result is that our code started to be unpredictable and we have not really fixed much of the deadlocks when we modified the code to fix one, we often created a new one. My favourite example are changes made in our classes on Jun 26, 2000 and Feb 2, 2004. Both tried to fix a deadlock and the second one effectively returned the state back, prior to the first integration. That means we have successfully shifted the amoeba shape of our application code to fix a deadlock in the year 2000, and four years later we just shifted it once more to improve in one part, but regress with respect to the 2000's fix. This would have never happened if together with the first fix, we also integrated a test!

A test for a deadlock!? Yes, a test for a deadlock. However surprising that may sound, it is possible and often not that hard (we often write a test for deadlock in about two hours, we never needed more than a day). Beyond the automated nature of such test, it also gives the developer confidence that he really fixed something, which is not usually fully obvious due to the esoteric nature of deadlock fixes, as they cannot be usually reproduced and thus verified by anyone. Also, when there is a test, one can choose a simpler solution that fixes the problem than to invent intellectually elegant, but in fact complicated, one. The result is that the art of deadlock fixing turns into regular engineering work. And we all want our applications to be developed by engineers, do we not?

Writing a test for a deadlock is not that hard. In our imaginary situation with createLabel we could do that by writing a component, overriding getPreferredSize, stopping the thread and waiting while another one locks the resources in the opposite way:

    
    public class CreateLabelTest extends TestCase {
      
      public void testSimulateTheDeadlock () {
        MyComponent c = new MyComponent ();
        c.validate ();
      }
    
      private static class MyComponent extends JComponent 
      implements Runnable {
      
        public synchronized Dimension getPreferredSize () {
          JLabel sampleLabel = createLabel ();
          
          new Thread (this).start ();
          wait (1000);
          
          assertNotNull ("Also can create label", createLabel ());
          
          return sampleLabel.getPreferredSize ();
        }
        
        public void run () {
          assertNotNull ("We can create the label", createLabel ());
          
          synchronized (this) {
            notifyAll ();
          }
        }
    }

The test works with two threads, one creates a component and validates it, which results in a callback to getPreferredSize under the awt tree lock, at this moment we start another thread and wait a while for it to acquire the createLabel lock. Under the current implementation this blocks in the JLabel constructor and as soon as our thread continues (after 1000ms) we create the deadlock. There can be a lot of fixes, but the simplest one is very likely to synchronize on the same lock as the JLabel constructor does:

  public JLabel createLabel () {
    synchronized (JLabel.getTreeLock ()) {
      JLabel l = new JLabel ();
      allCreated.add (l);
      return l;
    }
  }

The fix is simple - much simpler than the test - but without the test, we would not fix the shape of our amoeba. So the time spent writing the test is likely to get paid back.

Often the test can be written by using an already existing api, like in our case the getPreferredSize method (for example our test. Only in special situations does one need to introduce a special method that helps the test to simulate the problem (we used that in our howToReproduceDeadlock40766(boolean) called from PositionRef.java . Anyway deadlock tests are pure regression tests - one writes them when a bug is reported, nobody is going to write them in advance. At the beginning it is much wiser to invest in good design, but as we explained earlier, as there is no really universal theory on how to prevent deadlocks, one should know what he wants to do when a deadlock appears, for that we suggest that testing is the best way with respect to our amoeba shape.

Testing Race Conditions

While certain problems with multiple threads and their synchronization are hard to anticipate, as deadlocks mentioned earlier, sometimes it is possible and useful to write a test to verify that various problems with parallel execution are correctly handled.

We have faced such problem when we were asked to write a startup lock for NetBeans. The goal was to solve the situation when a user starts the NetBeans IDE for the second time and warn him that another instance of the program is already running and then exit. This is similar to the behaviour of Mozilla or Open Office. We decided to allocate a socket server and create a file in a well known location with the port number written to it. Then each newly started NetBeans IDE could verify whether a previously running instance is active or not (by reading the port number and trying to communicate with it).

The major problem we had to optimize for was to solve situation when the user starts more NetBeans IDEs at once. This can happen by extra clicks on the icon on the desktop or by dragging and dropping more files on the desktop icon of our application. Then more processes are started and they start to compete for the file and its content. The sequence of one process looks like this:

    if (lockFile.exists ()) {
      // read the port number and connect to it
      if (alive) {
         // exit
         return;
      }
    }
    // otherwise try to create the file yourself
    lockFile.createNewFile();
    DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile));
    SocketServer server = new SocketServer();
    int p = server.getLocalPort();
    os.writeInt(p);

but it can be at any time interrupted by the system and instead of executing all of this as an atomic operation, the control can be passed to the competing process which does the same actions. What happens when one process creates the file, and another tries to read it meanwhile, before a port number is written to it? What if there is a file left from a previous (killed) execution? What happens when a test for file existence fails, but when trying to create it the file already exists?

All these questions have to be asked when one wants to have really good confidence in the application code. In order to get the confidence we wanted, we inserted a lot of check points into our implementation of locking so the code became a modified version of the previous snippet:

    enterState(10, block);
    if (lockFile.exists ()) {
      enterState(11, block);
      // read the port number and connect to it
      if (alive) {
         // exit
         return;
      }
    }
    // otherwise try to create the file yourself
    enterState(20, block);
    lockFile.createNewFile();
    DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile));
    SocketServer server = new SocketServer();
    enterState(21, block);
    int p = server.getLocalPort();
    enterState(22, block);
    os.writeInt(p);
    enterState(23, block);

where the enterState method does nothing in real production environment, but in test it can be instructed to block in a specific check point. So we can write a test when we start two threads and instruct one of them to stop at 22 and then let the second one run and observe how it handles the case when a file already exists, but the port is not yet written in.

This approach worked pretty well and despite the skeptical opinions we heard when we tried to solve this problem, we got about 90% of the behaviour right before we integrated the first version. Yes, there was still more work to do and bugs to be fixed, but because we had really good automated tests for the behaviour we really implemented, our amoeba edge was well stiffened and we had enough confidence that we can fix all outstanding problems.

Analyzing Random Failures

Those 10% of random failures mentioned in the previous part emerged, as usual, into more work than just the next 10% of additional tests and a few fixes. They inspired this whole new part, as dealing with failures that happen just from time to time and usually on a computer that you do not own, requires more sophisticated techniques be used for their tracking.

The problem with parallel execution is that there is really not much help anyone can get if he wants to use it correctly. The methodology is either weak or missing or just too concentrated on specific cases, the debuggers are not really ready to push the debugged applications to their parallel limits, so in order to really move somewhere, people resort to the oldest solution - to println and logging. The old approach is pretty simple - add log messages into your code, run it a few times, wait until it starts to misbehave and then try to figure out from the log file what went wrong and fix it. In the case of automated tests a similar approach can be used. Enhance the application code and also the tests with logging, and if the test fails, output all the collected log messages as part of the failure report.

We have achieved this by writing our own implementation of ErrorManager (which is a NetBeans class used for logging and error reporting), but one can do this in any test by using java.util.logging and implementing its Handler. The implementation has to be registered at the beginning of the test and has to capture all logged messages and in the case of failure make them part of the failure message:

   public class MyTest extends NbTestCase {
     static {
       System.setProperty ("org.openide.util.Lookup", "MyTest$Lkp");
     }
     public MyTest (String name) {
       super (name);
     }
     protected void runTest () {
       ErrManager.messages.clear ();
       try {
         super.runTest ();
       } catch (AssertionFailedError err) {
         throw new AssertionFailedError (err.getMessage() + " Log:\n" + ErrManager.messages);
       }
     }
     
     public void testYourTest() throws Exception {
        // invoke some code
        ErrorManager.getDefault().log ("Do some logging");
        // another code
        ErrorManager.getDefault().log ("Yet another logging");
     }

     public static final class Lkp extends org.openide.util.lookup.AbstractLookup {
       private Lkp (org.openide.util.lookup.InstanceContent ic) {
         super (ic);
         ic.add (new MyErr ());
       }
     }
     private static final class ErrManager extends org.openide.ErrorManager {
       public static final StringBuffer messages = new StringBuffer ();
        
       public void log (int severity, String s) {
         messages.append (s);
         messages.append ('\n');
       }
       public void notify (int severity, Throwable t) {
         messages.append (t.getMessage ());
         messages.append ('\n');
       }
    } 
  }

The logging can be done by the test to mark important sections of its progress, but the main advantage is that your code shall be full of ErrorManager.log or (if you use the standard Java logging) java.util.logging.Logger.log. The test then collects messages from all places and in the case of failure, provides a complete and detailed (well, depending on how often and useful the log messages are delivered) description of the failure which then can be analyzed and either fixed or the logging made more detailed to help to track the problem down (as we did for example in ChildrenKeysIssue30907Test.java).

Sometimes people are reluctant to analyze random failures in tests as something that does not affect the production code. In fact, it may turn out that the problem is in the test and the code is ok, but relying on this is usually false hope. Without a deeper understanding of the problem, it can be in the application code and even if it is not reproducible all the time, if it occurs it can have huge consequences. If we want to have enough trust in the behaviour of our application and make its amoeba shape less amoebic, logging in application code and logging friendly tests turn out to be a very useful tool.

Advanced usage of Logging

The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the garbage collector that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions.

The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:

   public class LogCollectingTest extends NbTestCase {
     public LogCollectingTest(String n) { super(n); }

     protected Level logLevel() {
       return Level.ALL;
     }

     public void testMethodOne() { /* .... */ }
     public void testMethodTwo() { /* .... */ }
     public void testMethodThree() { /* .... */ }
   }

It says that logging of all message levels shall be collected and as such if any of the test methods fails, the reported exception is going to contain output of all messages logged during execution of the program in format like:

[name.of.the.logger] THREAD: threadName MSG: The message

Of course the text in the exception message is truncated, to be of reasonable length, but the full log report is also available. In the NbTestCase.getWorkDir() which would usually be /tmp/tests/testLogCollectingTest/testMethodOne is a file called testMethodOne.log which contains last 1Mb of logged output. That should be enough for cases when one needs deeper analysis of some complex logging code, as most of the recently logged information shall be there. It shall also be mentioned that if one is using xtest to execute the tests, the generated HTML report includes the copy of the workdir including the output log file.

Sometimes the logged messages can get pretty long and especially in case of random failures it may not be completely easy to analyse what is going on in the test. The best strategy to fight with such situation we have discoveder so far, is to add fail("Ok"); line into the place in the test where the execution randomly fails. This is going to generate a log file as well and the two log files - the one from real failure - and the one from the correct run can then be compared and diffed against each other. Of course it is wise to replace all the @543hac5f messages in the output with something more neutral, as each run is going to have these memory locations different. After eliminating this difference, it is generally possible to get a reasonable diff and understand what is the difference between the those two runs and what can be the root cause of the failure.

There maybe two possible reasons why such an understanding to the differences between log files might not be possible. The first is that there may not be enough logs. The information recorded in the log file is just to coarse to provide a meaningful information of what has been happening during the test run. Indeed, the fix is to insert more logging messages into the test code. Using the NbTestCase.logLevel method one can add more logging on various levels like Level.FINE, Level.FINER and Level.FINEST and in different test cases enable different test messages. In case of very suspicious random failure which can potentially influence a critical piece of code, it may even make sense to log infomarmation about each line of the code that gets executed, possibly with arguments send to various methods and state of local variables. That shall generate enough information to allow detailed understanding of the reasons of random failures.

On the other hand, there is a hidden catch: The more logging is put into the application, the more is its behaviour going to be different compared to the execution of the application without the logging enabled. Every call to Logger.log is going to add additional delays, formating of recorded messages is going to take a while, etc., etc. That is why a failure that might occur in one third of cases, may nearly disappear, due to all the logging code around. Indeed such a failure gets harder and harder to repeat and thus evaluate. One can nearly give up on trying to emulate it in debugger, or in the worse case, one cannot even face such failure on own computer. However from time to time the failure appears on someones machine. In NetBeans we have farm of testing machines running tests every day in various configurations, and this really helps to track such rare errors down. Althrough we may not even face the bug on the developer's workstations, from time to time we get a report from the testing infrastructure with full logging information and we can start to hunt the bug down.

The other side of the problem is that there is already too much logging messages and it is hard to get it right from where they come from. For example if the test executes a series of steps which result in repeated calls to the same place of the application code that of course prints the same log messages, one can easy get lost. This is especially hard if the applications does some computation in a loop repeating the same or at least similar messages on and on and on. In such case the really helpful advice is to put logging messages into the testing code itself. Those messages are the anchor that can lead the base orientation in the log output. When comparing the differences between the successful and unsuccessful run, one can first of all locate the test messages and compare the difference of application messages inserted among them. Basically the best advice to fight with random failures is to do a lot of logging, capture it in tests and print enough of logging messages in the application and also in the test code itself. Then the likehood of understanding the root cause of the failure increases dramatically.

A bit different direction of logging is testing that something really has been logged. For example when a new method in one's API that shall for some reason be overriden by subclasses, it might be desirable to let already existing classes compiled against previous version of the class to know about it during runtime:

    protected boolean overideMePlease() {
        Logger.getLogger("my.logger").warning("subclasses are supposed to override overideMePlease() method!");
        // some default
        return true;
    }

Of course everyone who is dead set on testing wants to be sure that the warning really get's printed in the right situations. To do that one can register is own logger or in NetBeans prior to 5.0 its own ErrorManager (like in the AsynchronousTest.java) and makes sure that its warning method is called. However as this was way too common pattern in NetBeans the there is a special utility method in NbJUnit that will handle everything itself. So one can just do:

    CharSequence log = Log.enable("my.logger", Level.ALL);
    // do the necessary actions
    if (log.toString().indexOf("warned") == -1) {
      fail("Warning shall be printed: " + log);
    }

This is just a little utility method, but it helps to easily access and analyze what gets logged from inside the test case. Together with the rest of the support in org.netbeans.junit.NbTestCase and org.netbeans.junit.Log classes this should provide good enough support for fighting with the unwanted shaking of the amoeba's edge using logging.

Execution Flow Control using Logging

This section is going to build on top of the content of previous description of logging support and on sections that were discussing testing for race conditions and tests that try to simulate deadlocks.

Both of these tests need some kind of hooks in the code that allow the testing environment to simulate some obscure situations and either corrupt some internal data structure or stop the thread in a place that is going to cause a deadlock with other executing thread. The previous parts suggested to put special methods accessible from the testing code that allow such kinds of execution control. Either to have code full of inserted statements like enterState(10, block) or to provide some overridable method like howToReproduceDeadlock40766(boolean) and make it do something insane in the test. This is possible, but actually there is a small improvement which is not easy to find but when discovered it seems so natural and easy to use, that one just has to wonder why it has not come to the minds earlier: Instead of putting special methods in the code, one can use logging!

Logging? Yes, logging! The beauty of such solution is in the fact that logging is or at least should be natural part of every at least a bit complicated program anyway, so one does not have to obscure the code with enterState and howToReproduceDeadlock40766(boolean), instead one can just use:

    logger.finest("Enter state 20");
    // or
    logger.finest("reproduceDeadlock40766now");

The first log is fully natural, the second is a bit suspicious because of the name of its message, but still it is not that much strange to be seen as alien piece of code not related to the program at all. Logging just belongs to programs.

Now the testing code can register its own Handler and in its publish method do all the wild things:

    class Reproduce40766 extends Handler {
      public void publish(LogRecord rec) {
        if ("reproduceDeadlock40766now".equals(rec.getMessage())) {
          // block and let other thread run as shown
          // in Deadlock40766Test.howToReproduceDeadlock40766(boolean)
        }
      }
    }
    Logger.getLogger("").addHandler(new Reproduce40766());

So instead of inseminating the application code with special hacks, one can achieve the same by adding a logging handler and analyzing messages passed to it during execution of the test. The application code stays clean and the test gets as powerful as is the number of messages that are logged in the application code. Because each logged message is a chance for the test to influence the behaviour of the application. Taken to an extreme, it shall be possible to fully control the behaviour a multithreaded program by suspeding all other threads than the actual one that shall be executed. Imagine following program:

    class Parael implements Runnable {
      public void run() {
        Random r = new Random();
        for (int i = 0; i < 10; i++) {
          try {
            Thread.sleep(r.nextInt(100));
          } catch (InterruptedException ex) {}
          Logger.global.log(Level.WARNING, "cnt: {0}", new Integer(i));
        }
      }
      public static void main(String[] args) throws InterruptedException {
        Thread t1 = new Thread(new Parael(), "1st");
        Thread t2 = new Thread(new Parael(), "2nd");
        t1.start(); t2.start();
        t1.join(); t2.join();
      }
    }

The program runs two threads, each of them counting by one to ten while pausing random number of miliseconds. Obviously the threads are going to be run in parael and the speed of their counting is going to be random. This can be easily verified by a simple NbTestCase with enabled logging:

  public class ParaelTest extends NbTestCase {
    public ParaelTest(String testName) { super(testName); }
    protected Level logLevel() { return Level.WARNING; }
    public void testMain() throws Exception {
      Parael.main(null);
      fail("Ok, just print logged messages");
    }    
  }

When executed once, the output can be for example like the one shown in the left, next time it can look like the one shown on right:

[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 2nd MSG: cnt: 9
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9    
[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 2nd MSG: cnt: 9
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9

Of course, when the program is executed again and again the output is unlikely to look exactly same as the previous one. That is fine, when two threads run in parael, the result is obviusly non-deterministic. But imagine that one of the execution orders is somehow special. For example that it is known to cause some race condition or deadlock. Indeed such wrong behaviour should be fixed, but aligned with the intention of this whole paper, it shall also be simulated, otherwise the fight with amoeba is never going to end. That is why one should try to write a test that simulates the order of execution that is known to be broken. For example let us try to generate higly unlikely sequence where each thread increments and prints one number, goes to sleep and lets the other thread run. It is highly unprobable that such kind of output happened randomly, so it is reasonable to question whether such test can be writen at all. But it can! With a little help from logging, here is the code that forces the threads to behave deterministicaly and always output just one number from the sequence:

public class ParaelTest extends NbTestCase {
  public ParaelTest(String testName) { super(testName); }
  protected Level logLevel() { return Level.WARNING; }
  public void testMain() throws Exception {
    Logger.global.addHandler(new BlockingHandler());
    Parael.main(null);
    fail("Ok, just print the logged output");
  }
  private static final class BlockingHandler extends Handler {
    boolean runSecond;
    public synchronized void publish(LogRecord record) {
      if (!record.getMessage().startsWith("cnt")) return;
      if (runSecond == Thread.currentThread().getName().equals("2nd")) {
        notify();
        runSecond = !runSecond;
      }
      try {
        wait(500);
      } catch (InterruptedException ex) {}
    }
    public void flush() {}
    public void close() {}
  }
}  

When the test is executed it can really be seen that each thread adds one to its counter and gives execution control to the other thread. And this behaves more or less deterministicly - e.g. nearly every execution of the test yields:

[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9
[global] THREAD: 2nd MSG: cnt: 9    

The basic trick that allows such miracle to happen is intercepting of the output messages. The special BlockingHandler registered by the test before the application code gets started analyzes the messages sent to it by the application threads and suspends and resumes them to simulate the execution order where each thread adds one number and then gets suspended by the operating system in favor of the other thread.

The beauty of such solution is that the application code does not really know that its execution flow is going to be controled by the test. If one looked at the application code itself, one could not guess that there is a test that does such wild things to the code, because the actual application code looks natural. Just due to the possibility to intercept logging the test gets the chance to influence the execution flow and as a result simulate even the wildest and unprobable execution order.

Indeed writing the BlockingHandler correctly may not be easy, especially if there is more than two threads that need to interact and if the messages that needs to be analyzed are not that simple. That is why the NetBeans JUnit extensions library contains a support method called Log.controlFlow which registers the handler itself and does all the thread manipulation itself. The only thing that is needed is to specify the expected order of messages. A very nice thing is that the format of expected messages is THREAD: name MSG: message which exactly matches the output reported by the NbTestCase when capturing of log messages is enabled (by overriding the logLevel method). So one can just copy the output and feed it to the controlFlow method possibly without any modifications. However as the real world messages sent to logger often contain some content specific to each run (like @af52h442 that identify location of the object in memory) it is possible to use regular expressions to describe the expected messages. So here is one possible rewrite of our test that uses the Log.controlFlow method to simulate the one by one order of execution:

public class ParaelTest extends NbTestCase {
    public ParaelTest(String testName) { super(testName); }
    protected Level logLevel() { return Level.WARNING; }
    public void testMain() throws Exception {
        org.netbeans.junit.Log.controlFlow(Logger.global, null,
            "THREAD: 1st MSG: cnt: 0" +
            "THREAD: 2nd MSG: .*0" +
            "THREAD: 1st MSG: ...: 1" +
            "THREAD: 2nd MSG: cnt: 1" +
            "THREAD: 1st MSG: cnt: 2" +
            "THREAD: 2nd MSG: cnt: 2" +
            "THREAD: 1st MSG: cnt: 3" +
            "THREAD: 2nd MSG: cnt: 3" +
            "THREAD: 1st MSG: cnt: 4" +
            "THREAD: 2nd MSG: cnt: 4" +
            "THREAD: 1st MSG: cnt: 5" +
            "THREAD: 2nd MSG: cnt: 5" +
            "THREAD: 1st MSG: cnt: 6" +
            "THREAD: 2nd MSG: cnt: 6" +
            "THREAD: 1st MSG: cnt: 7" +
            "THREAD: 2nd MSG: cnt: 7" +
            "THREAD: 1st MSG: cnt: 8" +
            "THREAD: 2nd MSG: cnt: 8" +
            "THREAD: 1st MSG: cnt: 9" +
            "THREAD: 2nd MSG: cnt: 9",
            500
        );
        Parael.main(null);
        fail("Ok, just print the logged output");
    }
}    

Indeed this may not look like a big simplification, but usually the important order of execution just affects some local place and then the size of the script to replay the execution flow can be simplified much very. For example imagine that we just want the thread 1st to print 5 first, then thread 2nd print 2 and then the thread 1st continue with 6. Here would be the script:

public class ParaelTest extends NbTestCase {
    public ParaelTest(String testName) { super(testName); }
    protected Level logLevel() { return Level.WARNING; }
    public void testMain() throws Exception {
        org.netbeans.junit.Log.controlFlow(Logger.global, null,
            "THREAD: 1st MSG: cnt: 5" +
            "THREAD: 2nd MSG: cnt: 2" +
            "THREAD: 1st MSG: cnt: 6",
            5000
        );
        Parael.main(null);
        fail("Ok, just print the logged output");
    }
}

As can be seen the log messages inside the application can became very useful and valuable tool for preventing unwanted moves of amoeba's edge. Not only that such messages can be enabled during execution of the application itself, but they should even be on and be captured when the code is executed in testing environment. Not only that having the log for failed tests is going to help analyzing its root cause, but if needed, the log can also serve as a script that can control the execution flow and thus became very useful in simulating extreme situations the application code can get to. Indeed this style of logging control can be taken even further. One possible style of enhancements is in the direction of better log generation and the second in better replaying of such log files. On the side of log capturing, an interesting thing to know is when a context switch - e.g. transfer of execution from one thread to another is happening - for that possible tool like dtrace could be very useful. The interesting implication of knowing about context switches and being able to replay them is that this in fact can turn a parael, non-deterministic application into a deterministic one (of course only on single processor machines). Other possible improvements might include use of AOP or any other bytecode patching techniques to insert logging code everywhere and thus getting info about the application even if it is not fully ready to provide it. Everything is possible, but just a future is going to tell which of such improvements is useful. For now it is enough to say, that NetBeans is successfully using the logging and control flow based on logging to fight with the amoeba effect in its APIs.

Testing Ant Scripts

This part of the testing serial is going to talk about ways to test ant tasks and ant build scripts. However it should be also generally useful. In fact it describes how to test code that needs to run in its own container. As such it can be applied to write tests that run inside apt, etc.

As may be known, the NetBeans IDE heavily depends on Ant. It is not just a matter of being able to cooperate with Ant, but the dependency is much deeper - basically everything that is related to development is not done by the NetBeans IDE itself, but is delegated to underlaying Ant infrastructure. This has many benefits, especially for the end user. It is just possible to have own build.xml and that is all to integrate with the NetBeans IDE. On the other hand, this flexibility also puts certain restrictions on the application code in the NetBeans IDE. Basically it is forbidden to do compilation, execution, deployment, debugging, javadoc generation, etc. from inside the IDE itself, it has to done by Ant tasks. This guarantees that the user can always execute the same action from the IDE as well as the command line. As a result, the programmers who write the NetBeans IDE had to learn how to write, debug and maintain Ant tasks. Obviously, some of them, who follow the philosophy of this article, had to also learn how to write tests checking the tasks behave correctly.

An Ant task is a class that extends predefined Ant interface, has some setters and contains implementtaion of the execute method. This class then gets instantiated by the Ant runtime container, some of its setter methods get called and then, after a while its execute method may be called to do the actual work:

  class MyTask extends Task {
    private String n1, n2;
    public void setN1(String x) { n1 = x; }
    public void setN2(String x) { n2 = x; }

    public void execute() throws BuildException {
      if (n1 == null) throw new BuildException("N1 needs to be specified");
      if (n2 == null) throw new BuildException("N2 needs to be specified");
      if (n1.equals(n2)) {
        return;
      }
      throw new BuildException(n1 + " != " + n2);
    }
  }

Indeed it is not that complex to write a simple unit test that just instantiates the class, passes the necessary values to the setters, executes the execute method and verifies whether the BuildException is thrown or not. However the reason why this is so easy is because the task itself does not use any services from the Ant runtime container. It does not ask for the location of the build script file using Task.getProject().getProjectFile(). It does not need to know or modify values of project properties via Task.getProject().getProperty or Task.getProject().setProperty. It does not try to instantiate other tasks that it might want to delegate to using Task.getProject().createTask. However these are the operations that common Ant tasks usually do. Satisfying such requirements of execution in fully isolated environment becomes really hard. Of course, one can try to provide mock objects for all the necessary container classes, but there are not many people around that know enough details of Ant to really do it correctly and that is why a little troublesome solution might be to test the tasks using directly generated Ant build scripts:

  public class MyTaskTest extends NbTestCase {
    private File script;

    protected void setUp() throws Exception {
      script = extractString(
        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
        "<project name=\"MyTaskTest\" basedir=\".\" default=\"all\" >" +
        "  <taskdef name=\"mytask\" classname=\"my.pkg.MyTask\" classpath=\"${java.class.path}\"/>" +
        "<target name=\"all\" >" +
        "  <mytask n1="hello" n2="${n2.value}"/>
        "</target>" +
        "</project>"
      );
    }

    public void testMyTaskWhenEqual() throws Exception {
      execute(script, new String[] { "-Dn2.value=hello" });
    }
  }

The meaning of the test is clear. First of all it generates an Ant build script and stores its location into file f and then it invokes the Ant runtime on top of the script while passing it a value for property n2.value. So the whole magic is just in the implementation of extractString and execute. Both are implemented in NetBeans' PublicPackagesInProjectizedXMLTest.java. Because the implementation of the extractString is pretty simple, let's concentrate on the execute method. If it was not necessary to capture the output and input and to prevent the Ant container to call System.exit, then the code would consist just from a call to org.apache.tools.ant.Main.main. But because we need to know whether the execution succeeded or failed, the method also instantiates a security manager and intercepts call to System.exit while remembering the returned value. It also replaces System.out and System.err to capture the output of the program, so it can be further analyzed if necessary. Having support like proved to be good enough support for testing and debugging NetBeans own Ant tasks in their natural container. One can even check that a certain script is going to fails:

    public void testMyTaskWhenNotEqual() throws Exception {
      try {
        execute(script, new String[] { "-Dn2.value=hi" });
        fail("This should fail as hello != hi");
      } catch (ExecutionError ex) {
        // ok, this should fail on exit code
      }
    }

Also, and this is very important for NetBeans, it is possible to test that the build scripts are correct, not just own written task. As has been said, the NetBeans IDE fully relies on Ant. That means whatever user does in the GUI gets reflected to the Ant build script. NetBeans IDE tries as much as possible to reuse the existing Ant tasks, so sometimes the generated script is just referencing standard Ant tasks, yet it has to work correctly. Maybe it is much more important to not make mistakes when generating the build scripts than when writing our own APIs! The scripts are likely to be used by all users of the NetBeans IDE, while our APIs are only just for those dare enough to develop on top of NetBeans Platform. That is why we found it fully natural to write tests that verify the generated build scripts. One of such tests can be found here. It generates the build.xml file, invokes the Ant framework on one of its targets and then just verifies that the generates files are in their expected locations and that they contain the right content. Needless to say that this kind of testing would be completely impossible without embeding the Ant runtime container in our testing infrastructure.

Some people still may say that this would all be much more easily done by hand. Maybe, but the hidden catch is repeatability. From time to time the project capabilities are extended with new features and each new feature makes the build script more complex. Ensuring by manual testing that a change to build script generation code or possibly to the shared build script does not break anything is beyond the abilities of a single human being. Indeed the QA guys can dedicate enough time before release to ensure everything works, but it is just much cheaper to execute automated tests and verify that all possible expectation about the behaviour of the build scripts are still satified. We have seen this with our work on apisupport, which was full of neverending and repeating regressions, because the build scripts used there are pretty complex. As soon as we started to really care and invest in writing tests, even the most complex and the most obscure setups were known to work. Our amoeba started to grow in the right direction and our quality assurance had suddenly (nearly) nothing to do.

Summary

If you got this far, you very likely saw a lot of convincing examples that writing automated tests is useful. Tests can be seen as a form of specification that helps to stiffen the Amoeba's edge (at least from one side - what is supposed to work really works). Tests can significantly reduce the cost of maintenance of an application. Tests form a nice, and in our opinion necessary, add on to other types of specification and shall be used together with documentation, use cases, UI specs, and javadocs every time one is serious about the application's future. This all has been pointed out many times, that it does not really make sense to repeat once again. That is why, let us use the summary to explain why do not write tests:

One of our friends had worked in an optical fibre company and one day his manager told him: boy, our company is going to pay you for a trip to Malaysia. Not bad, it is a long flight from Europe, but the manager forgot to tell him that this was the last thing they decided to pay him. He was sent to Malaysia to teach cheaper workers how to do his work and as soon as he returned back he was fired and replaced by them.

By writing automated tests you are contributing to shared knowledge of the project you are working on and in fact you are making yourself more easily replaceable. This is good for the project, but as shown on the fibre example, it can be dangerous for the workers. Is it even useful to have a successful product when it is no longer yours? Maybe not, but our NetBeans project experience shows, that without continuous improvements to quality, the whole project would be obsoleted a long time ago. Tests are a must if we want to keep our jobs. That is why most of us write them. And if someone does not? Well, we at least know why. He may be afraid of losing his job in favor of someone else. But maybe he is just not good enough and has to be afraid...

What is your attitude? Do you care about the future of your project or are you just afraid to lose your job? Are you going to write automated tests now?

Project Features

About this Project

openide was started in November 2009, is owned by Antonin Nebuzelsky, and has 53 members.
By use of this website, you agree to the NetBeans Policies and Terms of Use (revision 20160708.bf2ac18). © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo
 
 
Close
loading
Please Confirm
Close