Test Patterns In Java
Jaroslav Tulach
$Date: 2006/12/06 07:45:11 $ $Revision: 1.68 $
Latest version of this document can be found at
http://openide.netbeans.org/tutorial/test-patterns.html
Abstract
Testing is an important part of software development.
Effective testing is a key factor in reducing total cost of maintenance of
any application over its lifetime. It reduces the cost and time of
development, can increase savings on quality assurance and of course on
sustaining. Knowing when to invest in better design,
in post development quality assurance, in manual tests or in automatic
testing forms a basic difference between successful and unsuccessful
software projects in these tough and competitive days.
In this paper we start with general motivation and present automatic
tests as a form of functional specification. It will then quickly dive
deep down into hardcore JUnit test examples showing various forms of
regression tests verifying algorithm complexity, memory management,
data structure sizes, deadlocks, race condition behavior and tests
randomly generating new test cases, simulating user
clicks in the UI, API signature tests, etc.
We assume that you know what tests are and that tests are important
and useful part of software development. This presentation will show
how it looks when it comes to practical usage of a test framework.
You will learn techniques, tips and tricks for testing various aspects of
real world J2SE applications. We will also give examples of the savings
and improvements that were achieved by increased usage of extensive
automatic testing in the development of NetBeans platform and IDE.
What does "quality of an application" mean?
There are many possible answers to the question "what makes an application have good quality?".
Depending on one's standpoint the application can be requested to have a
slick UI, natural work flow, be acceptably fast, not crash from time to time,
etc.
These are all good expectations, and let's include them under one general
category - specification. If we are good UI designers and if we can
understand
the user needs, then we can create a good specification, which describes
how our application should look.
However this does not mean that our users will be satisfied with the quality
of what they get. A good enough specification is just half of what they
see. They also need good enough implementation. Any expectations we put into
our application can or even are (in ideal state) expressed in our specification,
but before they get to the user they have to be implemented in
code and it is very likely that the code will not follow
the specification fully. There will be differences between
the code and the specification.
The amount of the differences between our code and
our specification
is the measurement of quality
. If the final
application is not doing what we originally intended it to do, then it
is not good enough. Its quality or maybe better term is
our confidence in it is lowered with every difference from the
expected behaviour.
Sometimes the application does not do what we expect it to do, sometimes
it does more. Both situations are dangerous, but only one of them
is easy
to find. One can read thru the spec and test (manually, automatically)
if everything that is requested is really implemented. So by carefully
testing one's application, one can minimize the places where the code
offers less than expected. But even this has its limits:
Over the time, with new and new releases, regressions occur. The
functionality of the code is changing, it starts to do new things
and alas, it also stops to do what it used. Of course one can execute
the manual test procedures once more with every release, but that is first of all
very expensive as people have to try all specified features from all
previous releases, and the accuracy of such findings often is not good enough.
As a result the shape of the application code is changing from release
to release as amoeba
changes its shape over time. That is
why we call this behaviour the amoeba model
and in the rest of the
paper we give advice for ways to fight with its unwanted implications.
Tests As A Form Of Specification
This paper claims that writing automated tests is beneficial for the quality
of an application. But as we are not really
XP programmers, we are not
going to insist on tests being a must. We believe they are beneficial,
especially in certain areas, but sometimes there are other useful and
effective ways to achieve good enough quality.
One of the most important things automated tests do, is that they show
the intended use and behaviour of the applications code. We all
know how hard it is to maintain code inherited from another person - read
it all, analyze what it does and if a bug is reported, just guess whether
that is really a bug or some weird feature (e.g. intended behaviour).
As a result, when fixing such code, one has to
be afraid whether another hidden feature will not get broken.
This whole suffering can be greatly simplified by automated tests.
If written
correctly, they contain the expected calls to the application code
and the expected results. One can always check them to find
out whether
certain use of the code was anticipated or is just a side effect that
happens to work.
Automated tests help to fight the amoeba
behaviour. Nearly every change to the application code shakes the shape
of what its code does, fixing something and breaking something else at
once. By having a chance to execute the tests with an integration of
a change in code, one does not prevent the breakages, but at least
ensures that the anticipated intended behaviour of an application remains
unchanged.
By running various coverage tools (NetBeans
uses emma) one can find what part of
the application code is covered and which is not. This allows us to
find and delete code that is not needed or improve the automated coverage
for parts of such code and receive more of the benefits associated with
having automated tests.
An important social aspect of tests is their support for arrogance
.
Well, in fact it is not real arrogance, it is more effective use of
manpower. By having automated and isolated tests, one can refuse
bugs that others want to assign to him by finding or writing tests which
mimic the buggy behaviour and proving that the behaviour of one's code
is ok and the bug must be somewhere else. This provably lowers the number
of hours spent with a debugger and thus more effectively uses time
given to programmers.
On the other hand, there are areas which usually do not benefit much
from attempts to provide automated tests. User interface is one of such
examples. It changes often and automated tests (although possible - see
jemmy.netbeans.org) are hard to
write and are not fully reliable. Sometimes it might be more simple to
invest in manual testing. But we have to mention that
the
NetBeans project has set of about fifteen UI tests to verify basic
functionality of the
NetBeans IDE and they are executed continuously and they
helped us to catch various
regressions, but often also just cause false alarms. In spite of all
their drawbacks, they are still accepted as valuable.
From a general point of view, the automated tests can be seen as a kind of
specification. Sometimes it is enough to write a UI specification or
UML document - for certain aspects it is better to also provide automated
tests coverage and sometimes it just does not make sense to do anything
other than automated test. We'll give some examples later.
When talking about testing one has to be very careful. People tend
to have strong opinions and discussions easily get pretty heated.
And not only heated, often also confused. The whole methodology
is still evolving and things just have not settled yet.
Due to that, let us dedicate a separate section to clear up
possible definition confusion.
The interest in automated testing significantly increased during recent years
and there is no doubt that this was caused due to all the efforts of
the extreme programming movement.
The influence is so big, that many people associate testing and XP
together and feel that if you write tests you are doing XP or that
if you do not want to do XP you should not write tests.
This is a false impression.
XP provides a lot of useful tools (like junit)
and explicitly talks about the importance of writing unit tests by developers
of the application code, but it is a much larger methodology which (according
to the XP proponents) has to be followed fully or not at all.
So even we propose usage of tests, whether
we do XP or not XP is not the question that matters.
The important thing is to get confidence in application code we produce.
Another misleading question that is often asked is whether we write
unit
or functional
tests? Well, we use
junit and xtest
as base for writing and harness for running our tests, but that does not
mean the tests are unit ones. As we will demonstrate later, sometimes
it is useful to test the application code in isolation, sometimes it is more
meaningful to setup as close to production environment as possible.
This depends on the aspect we want to verify - in the first case
we concentrate more on functionality of a small unit, in the latter we
check how it behaves and cooperates with other components of the system.
Both of these aspects may seem different, but we can use the same tools
to check them (e.g. junit and xtest).
Again, it does not matter whether we write functional or unit tests -
what is important, is to force the amoeba's edge
of our application to stop shaking.
Often people disagree when it comes to the question who should write the
tests and when?
. Some say that when the application code is written,
it should be passed to the quality department which will write tests for it.
Others (including XP folks) insist on tests being written together
with the application code. We believe that both opinions have some merit.
If an application is tested by someone who has not coded it, it will surely
get tested in new and unexpected ways, and new and surprising
areas where the behaviour of the application (its amoeba
shape) does not match the specification will be discovered. Also, it
does not make sense to write some kind of tests in advance (for example the
deadlock test which we will discuss later) and it is better to wait for
a bug to be reported and write the test to simulate the bug and as a proof that
it really has been fixed. On the other hand, when tests are written by
different folks, it can lead to the infamous we vs. they
separation:
People writing application code will not understand the tests and those
who write the tests will not fully understand the application code and if
a test fails these two sides start to blame each other while nobody will
really know whether the bug is in test or in application code. This can
get really frustrating especially if the failures are intermittent and
unreproducible in the debugging environment.
That is why we share the XP opinion that at least some tests
shall be written by the application code developers. Beyond the fact
that then there is now someone who understands both, the code and the tests,
the other reason is that the application code is different if
written with testability in mind.
The biggest improvement of the application code that happens when
its developers also start to write the tests is that the
application code will no longer be
one big monolithic chunk of mutually interlinked code. Instead, for the
sake of easy testability
and also for the benefit of the application
code design, its code gets split into
smaller units that can exist separately.
As such they can then be tested on their own.
So instead of having to start the database, fill it with predefined data
and then launch the whole application to just verify that an invoice
entered by a user in a dialog is processed correctly, one separates the
invoice handling code from external environment by abstractions.
So the invoice code no longer talks directly to the database or requires
a user interface. Instead of that, it relies on an abstract interface
to handle its communication with the surrounding environment. In the test,
then a fake artificial testing environment is created and which
handles all the invoice processing calls, but without
all the references to the rest of the application.
This is called separation of concerns
and there are various
supporting techniques one can achieve that (see
mock objects,
dependency injection using
spring and many others),
but in NetBeans we are using
Lookup.
Our code is full of abstraction interfaces like:
abstract class DialogDisplayer {
public abstract void notify(String msg);
public static DialogDisplayer getDefault() {
return Lookup.getDefault().lookup(DialogDisplayer.class);
}
}
which separate the caller of its methods from the actual implementation. When
one writes:
DialogDisplayer.getDefault().notify("Hello world!");
it is unknown which implementation will actually get called when the code
is executed. This allows us to provide different implementation in the
whole NetBeans application (one
that actually shows a dialog to the user) and a different one when testing
the code, which can either print the message or record it for the rest
of the test to check that the expected call really happened.
This separation of concerns
gives us confidence that our
units of code are working correctly. It is true that this does not
imply that the whole application is working correctly, but if there was
no confidence in the smaller parts, there could be none in
the whole application.
The separation of concerns
is not the only change that happens when
the same people who write the tests can also change the application code.
Very often new, otherwise unneeded, methods that verify internal state
appear. These usually stay package private as the tests are packaged in
the same java package and are friends
, so no externally visible
changes are done, still the code gets more verifiable.
Another change is that the code gets more predictable. Until the only device that consumes
output of the application code is the user, it does not matter much whether
an action happens now or a few milliseconds later. The human will observe
nothing (actually our usual style to solve problems was to post the action
to SwingUtilities.invokeLater as the human would not realize
the delay). This is a coding style that just cannot work with automated
testing. The test is as quick as the computer and will observe any non-determinism
the application code does. So the testing affects the code by either
forcing it to remove such non-predictable constructs or by introducing
rendezvous methods that allow the tests and application code to synchronize
on the expected state and verify that the behaviour matches the expectations.
This is indeed much better then the opposite style of postponing an operation
to happen later and hoping that the human will not realize the delay, once
again this helps the application to stiffen its amoeba
edge.
Properties of Good Tests
The basic and probably only property of any test is that it has to be
useful. As we are proposing tests as the right way to improve
the quality of application code, it has to in some way contribute to
its quality. It has to help us shape the amoeba
in the way we want.
Regardless of any textual specification and talks made in corridors,
sometimes it is not easy to find out what parts of the application code
are supposed to do. Tests help with that. They show the intention.
If the same programmer who codes the application also provides a test for
its behaviour, it is clear that his expectations on input values, on
order of calls, throwing exceptions, etc. are going to be visible in
his tests. Later, when another person looks at the code, it can
sort out more easily what works just by a chance and what is the
essential and desired behaviour of the application code.
It is good if clever people develop your application and are able to fix
bugs. It is however much better if your application is being maintained
by careful people - e.g. those who care about the application's
future a lot and rather invest more work now to prevent
the same or similar problem to reappear later. Those that are just
more careful and do not want to spend their life just by patching,
patching, and patching broken code.
In such an environment, every bug can be seen as a possible report
that the behaviour of the application does not match our expectations.
Such a bug is a
chance to get the shaking amoeba edge of the application
code closer to the specification. But only if one, together with the
fix, also prevents regressions (by writing tests), the shape of the
amoeba is going to be stiff enough to not
regress and recreate the same bug in the future.
If we have enough tests and we want them to really prevent regressions,
we need to run them regularly to discover their failures. Whether they
shall be executed once per release, every week, every day or instantly depends
on the size of your
project and on the time the execution of tests consumes.
This depends on how soon you want to catch the regression. The sooner it
is discovered, the less painful is the identification of the change in
application code that caused it. In NetBeans
we have a set of tests, that take
about five minutes and are supposed to be executed before every commit to
our shared source base, we call them
commit validation.
Every developer is supposed to verify that his changes are not really bad
and do not break some basic functionality of the NetBeans
IDE. By warning that something is broken before it really gets integrated
we minimize the amount of work needed to investigate what is causing
the regression. However, these tests cannot include all of our tests that
would run for hours, so we also have a suite of tests executed daily
that is using the xtest
harness to run the tests enumerated in our
master configuration file and each module's individual
configuration file. This ensures that every regression is
reported within 24 hours after integration and that is why one needs to evaluate
only the changes made in the previous day to find out the cause.
Of course, if people use tests written by someone else to verify whether
their own work is correct (as NetBeans project does in
commit validation),
the tests must be reliable. Failure must mean that something
with my code changes is wrong. If this is not true, then people start
to loose trust in the whole system and as such, it is better to remove
randomly failing tests from validation suites as quickly as possible.
Although XP suggests to avoid it,
sometimes certain assumptions are made about the whole system and
not just individual parts - e.g. source file. Such intentions
are really hard to discover by reading single source files. This
forms a maintenance problem since even if you have clever people
that can read and understand the code, they may not realize that
it is necessary to also locate some distinct source file and it as
well.
Only one change then and a regression appears!
Of course if the assumption about the system
is expressed in automated test, you will get notification. In some
sense the tests can be used to link physically distinct sources
that are logically connected together. This gets especially
important when one does hacks around bugs in foreign code.
Writing a hack is ugly, but if that is the only solution to some bug,
there can be a strong push to do it. If it works, ok. But as usual
with our amoeba, we want to be sure that
the hack works with new versions of code we own. And as usual an automated
test for the hack functionality is the cheapest way to ensure that.
We used this hack verification test to check that our workaround
for certain weaknesses in the default implementation of
clipboard
works. Not only did the test verify the behaviour on JDK 1.4, but it also
caught a change in the clipboard handling in a beta version of JDK 1.5 and
we could negotiate some changes with the JDK team to make it work again.
Indeed such widespread relation among unrelated parts of code have its
problems, because the number of extra test dependencies greatly affects
the likelihood that the test will be maintainable as other components
change, that sudden test failures can quickly be diagnosed, and that
someone changing code will know to run the right tests to check the
changes before committing. In spite of these drawbacks it makes sense
to use types of tests like this as they can test otherwise unverifiable
assumptions - like whether two pieces of copied code behave the same -
this will be discussed later in more details. However one has to be careful - the
tests has to be executed often (in NetBeans all of them are run daily)
and the failures has to be analyzed quickly. Otherwise, due to wide
inter relations, it becames hard or even close to impossible to analyze
the code change that caused the failure.
Anyway the only property of a good test is its usefulness. It can help us
to show the intention, prevent regressions, verify connections of
unrelated code or even hacks into foreign code and sometimes it
can serve as an example (we have a
shopping cart application with few
checks around). Does not matter how, but if it helps to bring
the amoeba's edge closer to our expectations,
then it has to be good.
Testing Framework
When searching our memory we can realize that we wrote tests
long, long time ago. The evidence consists of all the main methods
in our application sources hidden in comments or sometimes even left in the
code that setup the part of the application in a specific way so
one can just verify his code and not be forced to launch the rest.
This indicates that programmers really like to write tests and like the
separation of interest, but still, for writing automated tests, it is
better to use an existing framework than to encode the testing logic
into main methods spread everywhere. And if we need testing framework
in Java, then we very likely need junit.
It is the flagship of the XP in Java, and even though it is not a masterpiece
of art, it works well, is easy to use for simple cases and is highly customizable
if one needs more complex behaviour.
Tests are written in separate classes, in the same packages and are
usually named as the class they test with postfix Test
:
public class MyClassTest extends TestCase {
private int variable;
private static int counter;
static {
// one and only initializations
}
public MyTest (String name) {
super (name);
}
protected void setUp () throws Exception {
// do some kind of setup
variable = counter++;
}
protected void tearDown () throws Exception {
// clean up and possibly post verifications
}
// The tests
public void testFirstInvocation () throws Exception {
assertEquals ("First means 0", 0, variable);
}
public void testSecondInvocation () throws Exception {
assertEquals ("Second is 1", 1, variable);
}
}
When executed by a harness, the static initialization is called once,
then an instance of the test is created for each method with prefix test
.
The sequence of setUp, test method, tearDown is
then called on each instance. The test is expected to perform some computations
and then verify if the result is correct by using various predefined assertions like
assertTrue, assertEquals, assertSame
or unconditional fail. If all tests passed, the harness
succeeds, otherwise it reports an error.
JUnit is simple and extensible. That
is why the NetBeans team is using our own
extension called nbjunit which, instead of TestCase,
provides
NbTestCase
with a few additional useful assertions and other useful utilities.
The nbjunit is absolutely independent on the rest of the
NetBeans libraries, so it can be easily used in any java project.
Here is the list of possible ways how to get the library:
-
The library can be easily used
by anyone running
NetBeans IDE, select Tools, Update Center
in menu and download it. Then you can add the library to any of your
project test classpath and start to use it.
-
The library has been uploaded to jpackage.org
repository, so if you are on correctly configured
operating system,
just use
urpmi nbjunit or apt-get install nbjunit
and the NetBeans JUnit Extensions
and Insane Library will be downloaded
and installed.
-
Or you can download the
sources for the library from
xtest.netbeans.org
and build it ownself.
However a typical NetBeans test not only uses
NbTestCase,
but also Lookup
to handle the separation of concerns
.
This is easily configurable by use of MockServices support class:
public class MyTest extends NbTestCase {
public MyTest(String name) {
super(name);
}
protected void setUp() throws Exception {
super.setUp();
org.netbeans.junit.MockServices.setServices(DD.class);
}
public static class DD extends DialogDisplayer {...}
}
For more sophisticated cases you can do this manually, so the static one-time-only
initializer registers the test's own implementation of the Lookup
providing the environment needed for the test:
public class MyTest extends NbTestCase {
static {
System.setProperty("org.openide.util.Lookup", Lkp.class.getName());
}
public MyTest(String name) {
super(name);
}
public static final class Lkp extends org.openide.util.lookup.AbstractLookup {
public Lkp() {
this(new org.openide.util.lookup.InstanceContent());
}
private Lkp(org.openide.util.lookup.InstanceContent ic) {
super(ic);
ic.add(new DD());
ic.add(...);
}
}
}
Of course junit is flexible enough, so
one can use other extensions, and possibly also a different harness (recently
there was a lot of buzz around TestNG), but
as our example are going to use the NbTestCase, we thought it
is reasonable to explain our specific terminology.
When there is enough tests
While writing tests, people can ask: how many of them should be written?
The simple answer is to write tests while they are useful. The more precise,
more complex and less clear answer is going to be covered in this chapter.
There are various tools out there that help to measure test coverage.
We have selected emma for measuring
the coverage of our application code by our tests. When invoked (for
example from the popup menu of any project from NetBeans.org) it
instruments the application code and invokes automated tests on it.
While running, it collects information about all called methods,
visited classes and lines and then it shows a summary in a web browser.
Counting coverage by visited methods is very rough criteria, but it
can be surprisingly hard to get close to 100%. But even if you succeed,
there is no guarantee that the resulting application code works correctly.
Every methods has a few input parameters, and knowing that it succeeded
once with one selection of them, does not say anything about the other
cases.
Much better is to count the coverage by branches or lines. When there
is a if (...) { x(); } else { y(); } statement in code
of your method, you want to be sure that both methods, x and y will be
called. The emma tool supports this
and by helping us to be sure that every line is visited, it gives us
confidence that our application code does not contain useless lines.
Still, the fact that a line is visited once, does not mean that our
application code is not buggy.
private sum = 10;
public int add(int x) {
sum += x;
}
public int percentage(int howMuch) {
return 100 * howMuch / sum;
}
It is good if both methods get executed, and fine if
we test them with various parameters - still we can get an error if we call
add (-10); percentage(5), because the sum will be
zero and division by zero is forbidden. To be sure that our application
is not vulnerable to problems like this, we would have to test each
method in each possible state of memory it depends on (e.g. each value
of sum variable) and that would give us the ultimate
proof that our application code works correctly in a single threaded
environment.
But there is another problem - Java is not single threaded. A lot of applications
start new threads by themselves, and even if they do not, there is the
AWT event dispatch thread, the finalizer thread, etc. So one has to count
on some amount of non-determinism. Sometimes the garbage collector just
kicks in and removes some unneeded
objects from memory, which can change
the behaviour of the application - we used to have a never ending loop, which
could be simulated only if two mozilla browsers and an evolution client was
running as then the memory was small enough to invoke the garbage collector.
This kind of coverage is unmeasurable.
That is why we suggest people to use code coverage tools as a way to
sanity check that something is not really under tested. But it is necessary
to remind ourselves that however high the coverage is, it does not prevent
our application code fully from having bugs. So we, in order to help
to fight the strange moves of an application amoeba
shape, suggest to write a test when something gets broken - when there is a
bug report, write a test to verify it and prevent regressions. That way
the coverage is going to be focused on the code where it matters - the one
that really was broken.
AWT Testing
User Interface automated testing is hard; especially in Java that runs
on a variety of platforms, each with slightly different behaviour - sometimes
keyboard focus follows the mouse, sometimes it does not. Swing comes with
different look and feels and our automated tests, in order to be useful,
has to overcome this and work reliably. And this is hard to achieve.
Rule #1 is clear: Avoid AWT testing. Separate your application code into
two parts - application logic and actual presentation. Test your models
separately from the UI - writing automated tests for them shall be possible,
as they are the models and are independent from the presentation
(of course if the design follows model-view-controller separation, as
Swing does). For
example instead of trying to write a test for your checkbox, write a test
for its ToggleButtonModel, make sure it works and then let the UI
simply delegate to that model (this is what we did when writing
StoreGroup
and its
tests). That gives you confidence in your code as the logic is
tested and the UI is simple enough that a manual test once a release is enough
to guarantee it works as it should.
Sometimes it is possible to avoid showing UI components, but one has to
run the test inside of the AWT dispatch thread, because the code follows
the Swing threading model (e.g. everything in AWT thread). The
simplest library independent approach for this is to use
invokeAndWait:
public void testSomething () throws Exception {
javax.swing.SwingUtilities.invokeAndWait (new Runnable () {
public void run () {
callMethodThatDoesTheTest ();
}
});
}
The logic of the test method is in callMethodThatDoesTheTest
and it is executed in AWT. The above example however does not handle
exceptions correctly and if we try to, the code gets a bit more
complicated (see run
method in
NbTestCase) and that is why we allow tests to just override runInEQ
and the
NbTestCase handles the rest for them automatically:
public class MyTest extends NbTestCase {
protected boolean runInEQ () {
return true;
}
}
See for example
CookieActionTest.java
that needs to run in the AWT event thread as it tests Swing-like action
implementation.
In some cases, the test code itself runs outside of the AWT event queue, but
somewhere inside the application code certain actions are posted to AWT.
In such case it may be handy (and often necessary) to wait for the
execution to finish, before continuing the test. Then the following
method may be useful:
private void waitEQ () throws Exception {
javax.swing.SwingUtilities.invokeAndWait (new Runnable () { public void run () {} });
}
It posts an empty Runnable into the AWT event thread and waits for it to finish.
As the queue of runnables is FIFO, the runnable is scheduled at the end
after all tasks posted by the application and when it is finally executed
one can be sure that all delayed tasks of the application in the AWT event queue
are over as well. See
DataEditorSupportTest.java
for an example of a test that needs to wait while the application code
finishes some actions posted to the AWT event thread.
There are situations when the generic mock object
approach can
be useful for AWT testing as well. For example in order to test
UI environments that do not support custom cursor definition the
UtilitiesTest
defined its own AWT Toolkit that does not support custom cursors.
In another example (see
CloneableEditorUserQuestionTest.java
another mock object for DialogDescriptor is used to fake the communication with the user.
It replaces the DialogDescriptor, which in a production
environment shows a dialog and interacts with a user by showing a UI component
with a headless implementation that returns immediately preset values
and thus allows automated verification of application code that itself
communicates with humans.
If you cannot split your code into logic and UI and you absolutely have
to write automated tests, then use Jemmy.
It is a junit extension that operates on realized UI components and allows
automated navigation on dialogs. An excellent introduction to jemmy can
be found in the
Jemmy Testing Toolkit presentation. For NetBeans
UI testing there is an extension of Jemmy which
is called Jellytools.
Also you can watch a flash demo showing usage of testing tools in
NetBeans. It is available at the
automated testing tools
overview page.
Algorithm Complexity Tests
A very important but fragile piece of functionality that really deserves
automated testing is performance. Nearly everyone finds out that something
seems to be slow in the application. The natural resolution is to get the
profiler, run it on the application, find out what is wrong and fix it.
This works, and we all do it, but the question is how effective is this?
As far as we know the profiling is usually done when the application code is
ready and all features are implemented. When performance improvements are
applied at that time, it is clear that they will stay till the release
(as all other integrations are already done). That means that we
shaped our amoeba for the release, but what will
happen by the next one, will the amoeba not
change shape again?
Of course it will! After the release (and all the profiling effort) a
new round of feature integration starts and for a certain point in time
nobody will care about the performance. The new code surely changes
expectations of the old one and very likely negates the performance
improvements made in the hectic part during the end of the previous release.
So it is the time to take the profiler, find what is wrong, provide improvements
that make the application state acceptable enough and start the whole
vicious circle again. Are you surprised? You should not be, it really
works this way. Is there a better way? Yes, let's analyze it.
During the hectic profiling time, when one uses the profiler to find hotspots and
provide speed ups to them, one invests a bit of time to write a
speed test to demonstrate what is wrong. That test will first of
all serve as a proof that the performance problem has been fixed, but
(most importantly) also as a continuous reminder that the intention of this
code is to be fast and as a warning that some day this intention was
broken. This will prevent the amoeba to regress
and save a lot of work when the new release is being profiled.
The basic idea behind a speed test
is simple. Just execute the
same algorithm on a data set of different sizes and confirm that the time
matches our expectations (e.g. it is constant, linear, quadratic, etc.).
This can easily be written in any test harness, including plain
junit.
The following test is trying to access the middle element of a
linked list and checks that the access time is constant - well
it checks whether the slowest time is three times slower than the
first one:
public class WhyIsTheAccessToListSlowTest extends TestCase {
private int size;
private List toCheck;
private long time;
private static long one = -1;
protected void setUp () {
size = Integer.valueOf (getName().substring (4)).intValue();
toCheck = new LinkedList (Collections.nCopy ("Ahoj", size));
time = System.currentTimeMillis();
}
protected void tearDown () {
long t = System.currentTimeMillis() - time;
if (one == -1) {
one = t;
} else {
if (t > one * 3) {
fail ("The time is just too long");
}
}
private void doTest () {
for (int i = 0; i < 10000; i++) {
toCheck.get (size / 2);
}
}
public void test10 () { doTest (); }
public void test100 () { doTest (); }
public void test1000 () { doTest (); }
public void test10000 () { doTest (); }
This works and really can discover that access to the middle of
the linked list is too slow, but
such measurements can be influenced by various external events: first
a garbage collector can be invoked during one of the tests and effectively
stop the execution, making the time measurement too big. Or, a hotspot
compiler can step in and decide to compile the application code to make it
faster. As a result one of the tests will just take longer time as the
hotspot compilation will slow it down, and the later tests executed after
it will be faster as they will run the compiled code, much faster than the
interpreted one. We have really observed such random failures and that is
why we created NbTestCase.speedSuite a
junit like wrapper around our
test cases that can execute the tests more times to eliminate the influence
of the garbage collector and hotspot compiler.
The results are excellent, just by allowing the test to restart itself
a few times in case of failure the indetermistic factors of external
environment were eliminated and
we had no random failures in our speed tests for more than a
year.
Here is the previous test rewritten in our speed suite
style:
public class WhyIsTheAccessToListSlowTest extends NbTestCase {
private int size;
private List toCheck;
public static NbTestSuite suite () {
return NbTestSuite.speedSuite (
WhyIsTheAccessToListSlowTest.class, /* what tests to run */
10 /* ten times slower */,
3 /* try three times if it fails */
);
}
protected void setUp () {
size = getTestNumber ();
toCheck = new LinkedList (Collections.nCopy ("Ahoj", size));
}
private void doTest () {
for (int i = 0; i < 10000; i++) {
toCheck.get (size / 2);
}
}
public void test10 () { doTest (); }
public void test100 () { doTest (); }
public void test1000 () { doTest (); }
public void test10000 () { doTest (); }
An example of such test from NetBeans
code base can be found for example at
DataShadowSlowness39981Test.java
and we can confirm that it helped us prevent regressions in
our application amoeba edge.
Memory Allocation Tests
One of the least specified things in conventional project documentation
is the memory model of your application. It is not that surprising
given the fact that there is no
generally known methodology for how to design the memory model of an application.
However it is very surprising as writing a long running application
without ensuring that you manage your memory effectively and do not
allocate more and properly deallocate what is not needed, one can hardly
write an application that is supposed to run for days.
Again, the classical model is to code the application, start the profiler
and search for possible memory leaks. When found, return back to the code,
fix it and on and on and on, until the application is in a releasable
shape. And again, as described many times, this is not effective if one
plans to release more than one version of the application. As after a time
all the improvements from the profiler hunting
phase which helped
to ensure the amoeba shape is better will regress.
Unless we make sure they are continuously tested.
The standard junit does not offer much
in this area, so we have to write a few extensions for our
NbTestCase
both based on, or supported by, our memory inspection library called
Insane.
The first thing one has to fight against
regarding memory management in modern object oriented languages
like Java are memory leaks. The problem is not that an application would
address an unknown area in memory - that is not possible due to garbage
collector - but sometimes (also due to garbage collector) objects
that one would like to disappear, still remain in memory. If a certain
operation always leaves garbage after its execution, after a few
executions one can find, that the free memory is shrinking and shrinking
and the whole application is slower and slower. For this the
NbTestCase
offers the method assertGC:
Object obj = ...;
WeakReference ref = new WeakReference (obj);
obj = null;
assertGC ("The object can be released", ref);
If you believe that after some operation an object shall be no longer needed in memory, you just create a WeakReference
to it, clear your reference to the object and ask the assertGC to try to release the object from memory.
The assertGC tries hard to force garbage collection of the object, it does a few System.gc, allocates some memory,
explicitly invokes finalizers and if the WeakReference is cleared, successfully returns. If not, it
invokes the insane library and asks it
to find a reference chain that keeps the object in memory. Possible failure could then look like:
junit.framework.AssertionFailedError: Represented object shall disappear as well:
private static final java.lang.ref.ReferenceQueue org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef.QUEUE->
java.lang.ref.ReferenceQueue@17ace8d->
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef@bd3b2d->
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef@14653a3->
java.lang.String@130c132
at org.netbeans.junit.NbTestCase.assertGC(NbTestCase.java:900)
at org.netbeans.modules.adaptable.SingletonizerTest.doFiringOfChanges(SingletonizerTest.java:177)
at org.netbeans.modules.adaptable.SingletonizerTest.testFiringOfChangesOnAllObjects(SingletonizerTest.java:116)
at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:135)
at org.netbeans.junit.NbTestCase.run(NbTestCase.java:122)
which can be read as there is a static field QUEUE which points to ReferenceQueue and through
two AdaptableRef it holds the String which we wanted to garbage collect in memory.
Another thing that may affect performance of an application code is the size of its data structures.
If you known that a certain object is going to be simultaneously kept in memory in thousands of instances,
you do not want it to occupy 1000 bytes or more. You want to minimize its size. Again, this can be
observed in profiling, or this can be a well-in-advance thought-out decision, but the usual problem remains -
we need to ensure that from release to release the size constraint will not regress. For that our
NbTestCase
provides assertSize check:
class Data {
int value;
}
Object measure = new Data();
assertSize ("The object is small", 16, measure);
It uses the insane library to traverse the graph
of all objects referenced from the measure variable and computes the
amount of occupied memory. Then it
compares the value with the expected one and if lower or equal, it passes. Otherwise it fails, printing
sizes of individual elements to let the programmer analyze the failure:
junit.framework.AssertionFailedError: Instance is small: leak 8 bytes over limit of 64 bytes
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableImpl: 1, 24B
org.netbeans.modules.adaptable.SingletonizerImpl$AdaptableRef: 1, 32B
$Proxy0: 1, 16B
at org.netbeans.junit.NbTestCase.assertSize(NbTestCase.java:937)
at org.netbeans.modules.adaptable.SingletonizerTest.testProvidesImplementationOfRunnable(SingletonizerTest.java:58)
at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:135)
at org.netbeans.junit.NbTestCase.run(NbTestCase.java:122)
So it can be seen that the AdaptableImpl references AdaptableRef and Proxy
which together with their fields consume 72 bytes, which is more than the expected 64.
The size of simple java.lang.Object instance which has no fields
is 8 bytes. When adding one field with integer or reference to other object
the size increases to 16 bytes. When adding a second such field, the size stays
at 16 bytes. The third one however increases it to 24 bytes. From that it
seems that it makes sense to round up the number of fields in an object
to two, and we are sometimes, in really sensitive places, really doing that.
However it has to be noted that the computed sizes are logical
ones,
the actual amount of occupied memory depends on the implementation of the
virtual machine and can be different, but that shall be ok, as the
test of logical size
expresses the intention of the programmer
which is independent on the actual virtual machine architecture.
We have found both assertGC and assertSize very valuable
in stiffening the edge of our application's amoeba. By
writing
tests using these asserts
we can specify the expected
behaviour of our application. So these tests became part of our
functional specification, and not only that, being automated tests, they are an
active specification that verifies its validity every time we execute them.
Randomized Tests
Most of the ways to test the application code we have discussed are useful
when you find out that something is wrong and you want your fix to last
and stiffen the amoeba shape closer to the desired
look of the application. Tests usually do not help much in discovering
differences between the specification and reality. The one exception however
are randomized tests - they help to test the code in new, unusual ways and
thus can discover new and unusual problems of the application code.
The basic idea is simple, just use a random number generator to drive
what your test does. If, for example, if you support operations
add and remove use the
generator to randomly specify their order and parameters:
Random random = new Random ();
int count = random.nextInt (10000);
for (int i = 0; i < count; i++) {
boolean add = random.nextBoolean ();
if (add) {
list.add (random.nextInt (list.size (), new Integer (random.nextInt (100)));
} else {
list.remove (random.nextInt (list.size ());
}
}
This will not invent new ways to call your code, just new orders of
calls that can reveal surprising problems, because not all
combinations of operations have to be anticipated by the programmer
and some of them may lead to failures.
An important feature of each test is its reproducibility or at least
clear failure report. It is fine that we know there is a bug in our
code, but if we do not know how to reproduce it, we may not be able
to analyze and fix it. The reproducibility of random tests is even
more important, as in fact, we do not know the sequence of computations
that is really being performed. The first
step to achieve it is to not create a random generator blindly, but
use an initial seed which, when passed repeatedly generates the same
sequence of numbers. If you look at the implementation of the default
Random constructor, you will find out that the initial
seed is set to the current time, so we can mimic the behaviour by
writing:
private void doRandomOperations (long seed) throws Throwable {
Random random = new Random (seed);
try {
// do the random operations
} catch (AssertionFailedError err) {
AssertionFailedError ne = new AssertionFailedError (
"For seed: " + seed + " was: " + err.getMessage ()
);
throw ne.initCause (err);
}
}
public void testSomeNewRandomScenarioToIncreaseCoverage () throws Throwable {
doRandomOperations (System.currentTimeMillis ());
}
which knows the initial seed and prints as part of the failure message
if the test fails. In such case we can then increase the coverage by
by adding methods like
public void testThisUsedToFailOnThursday () throws Exception {
doRandomOperations (105730909304L);
}
which exactly repeats the sets of operations that once lead to a failure.
We used this approach for example in
AbstractMutableLazyListHid.doRandomTest.
One problem when debugging such randomized test is that by
specifying one number, a long sequence of operations is defined.
It is not easy for most people to imagine the sequence by just
looking at the number and that is why it can be useful to provide
better output so instead of calling doRandomOperations (105730909304L);
one can create a test that exactly shows what is happening in it.
To achieve this we can modify the testing code not only to
execute the random steps, but also generate a more usable error
message for potential failure:
private void doRandomOperations (long seed) throws Throwable {
Random random = new Random (seed);
StringBuffer failure = new StringBuffer ();
try {
int count = random.nextInt (10000);
for (int i = 0; i < count; i++) {
boolean add = random.nextBoolean ();
if (add) {
int index = random.nextInt (list.size ();
Object add = new Integer (random.nextInt (100));
list.add (index, add);
failure.append (" list.add(" + index + ", new Integer (" + add + "));\n");
} else {
int index = random.nextInt (list.size ();
list.remove (index);
failure.append (" list.remove(" + index + ");\n");
}
}
} catch (AssertionFailedError err) {
AssertionFailedError ne = new AssertionFailedError (
"For seed: " + seed + " was: " + err.getMessage () + " with operations:\n" + failure
);
throw ne.initCause (err);
}
}
which in case of error will generate human readable code for the failure
like:
list.add(0, new Integer (30));
list.add(0, new Integer (11));
list.add(1, new Integer (93));
list.remove (0);
list.add(1, new Integer (34));
We used this technique to generate for example
AbstractMutableFailure1Hid.java
which is long, but more readable and debuggable than one seed number.
The randomized tests not only help us to prevent regressions
in the amoeba shape of our application by
allowing us to specify the failing seeds, but also (which is a unique
functionality in the testing), it can help to discover new areas
where our shape does not match our expectations.
Reusing Tests
The junit framework provides a lot of freedom and allows
any of its users to customize it in nearly unrestricted ways (so we
could create the
NbTestCase).
The standard way of writing tests by prefixing the method name with test
does not need to be followed and one can create its own style. Sometimes
that is useful, sometimes necessary, but often the built-in standard
is enough, because it is well thought out and offers a lot. Even multiple reuse
of one test in different configurations is possible.
The simplest way of reusing a test is to let it call a protected factory method
in one class and override it in a subclass with a different implementation:
public class WhyIsTheAccessToListSlowTest extends NbTestCase {
private List toCheck;
// blabla
protected void setUp () {
size = Integer.valueOf (s.substring (5));
//
// calls the factory method to create the actual
// instance of the list
toCheck = createList (size);
}
// here comes the testing methods
// imagine some that use the toCheck field initialized in setUp
/** The factory method with default implementation.
*/
protected List createList (int s) {
return new LinkedList (Collections.nCopies (s, "Ahoj"));
}
}
public class WhyIsArrayListFastTest extends WhyIsTheAccessToListSlowTest {
protected List createList (int s) {
return new ArrayList (Collections.nCopies (s, "Ahoj"));
}
}
This example creates two sets of tests, both running over a List
but in both cases configured differently.
This could also be done in a more advanced way by usage of
factories and manual test creation as we did in
AbstractLookupBaseHid.java, AbstractLookupTest.java, ProxyLookupTest.java,
but for a lot of cases, the built-in inheritance in junit
is enough. If used one can easily get twice as many tests,
covering twice as many scenarios.
Writing one test and using it for more configurations can be very useful
when one has a family of various implementations of the same interface
that can be assembled by the final user into various increasingly
complicated configurations. Imagine for example that one writes an
implementation of java.io.InputStream and provides a test
to verify that it works correctly. But in real situations, the stream
is not going to be used directly, it will be wrapped by java.io.FilterInputStream
or its subclass. That means we want to write another layer of test
that executes the same operations as the previous test, but on our steam
wrapped with FilterInputStream and yet another by a
FilterInputStream with overridden methods. If these
tests work, we will have more confidence that our implementation will
really work on various configurations. But then we realize that usually
the stream will also be wrapped with java.io.BufferedInputStream
for performance reasons. Well, that is easy, to ensure that everything
will work smoothly, we just create a new layer and configure the
test to use BufferedInputStream.
We have used this technique in our implementation of
javax.swing.ListModel with a very special semantic, that
itself is pretty complex
(LazyListModelTest.java)
and required a lot of testing, but then it needs to be propagated
thru various layers thru our APIs with unchanged semantics, so
we added three additional layers
(LazyVisualizerTest.java,
LazyVisualizerOverLazyChildrenKeysTest.java,
LazyVisualizerOverLazyChildrenKeysAndFilterNodeTest.java).
Whenever there was a failure, we could immediately find out which
part of our code needs fixing. If all four tests failed, then the problem
was in the basic algorithm, if the basic test passed, and one of the
more complicated did not, we immediately know which layer in our
code needs to be fixed. This was very handy for debugging. One did not
need to step through behaviour of all tested code, it was enough to just pay attention
to code in the problematic layer.
This kind of setup has also been found very valuable for randomized tests.
Whenever there was a failure, we recorded the seed and created a fixed
test repeating the behaviour of the failed random test. And again, if
all four suites failed, we know that the basic algorithm is bad, if some
of them worked, we knew which part of the application to investigate to find
the problem.
A surprising place where the layered style tests can be helpful is
the eternal fight between people who want more reuse of the code and people that
are afraid to allow it. We usually require at least three different uses
of a certain functionality before we even consider to change it and maintain
it as an API. The reason is that the reuse
is associated with significant additional costs.
One needs to find a reasonable generalization that suits
all the known uses and one has to be more careful when developing such
publicly shared parts (e.g. write more tests), otherwise things can quickly
get broken and the whole benefit of code reuse is gone. That is why we
sometimes suggest to copy the code instead. We know that
copy based programming is not nice and that is has its own problems, but
sometimes that is really more convenient. Especially when its biggest
problem - i.e. the possibility that the copied code gets out of sync -
can be easily overcome by a test. Supposed that the original code is well
tested, then it is not hard, when copying the code to modify its test to
use factory for creation of the tested object and together with the code
create also a test that changes the setup to create an object matching the
new scenario and executes all the old tests on it. We used this in
TopComponentGetLookupTest
which sets up its environment in setUp method and defines a set of tests that shall pass.
The test is then extended by different module's
ExplorerUtilCreateLookupTest.
This effectively prevents the code from getting out of sync.
If any future fix in the original code changes the behaviour or
adds a new feature covered with tests then the next run of automated
tests will fail on the copied code. As the inherited test executes the
same operations on the currently out-of-sync copied code as we will
be properly notified to update our copy of the application code.
More layers of the same tests is a very valuable and powerful
pattern that not only helps to form the amoeba into
a more desirable shape, but also effectively uses the invested work into
writing one test, as it exploits its power in more situations and helps
to more quickly analyze the area that caused the test fail.
Whenever one designs an interface that others can implement, one
exposes himself to possible problems caused by wrong implementations.
It is very likely that at least one implementor will not do everything
correctly and something will go wrong. We have faced that with our
generic wrapper
virtual file system api. It provides a generic API to access
resources of regular operating system files, in ZIP and JAR archives,
version controlled files, ftp archives, and many more. Clients work
with just the API and if there is a bug it ends up reported against
the generic framework, regardless of the actual implementation that
is often responsible for the faulty behaviour.
To prevent this, another step of layered tests can be used
as a very good solution.
The provider of the API that allows other implementors to plug-in, can
write a generic set of tests that describe the properties that each
implementation shall have. These tests, contain an abstract
factory interface that implementors shall provide together with
their implementation of the application code. Their factory sets up the
environment for their code to work and the rest of the test then
verifies on the pre-setup object that all the required properties
are satisfied. This is often referred to as TCK - test compatibility
kit. In a way, a layered test, but with a generic interface,
not knowing all the implementors that are going to reuse it.
The already mentioned
virtual file system library has such TCK in which the heart is
formed by a factory class with one create and one cleanup method:
public abstract class FileSystemFactoryHid extends NbTestSetup {
protected abstract FileSystem[] createFileSystem(String testName, String[] resources) throws IOException;
protected abstract void destroyFileSystem(String testName) throws IOException;
}
All the tests use these methods to get the FileSystem to
operate on and the various implementations provide their implementation
and select the test sets that shall be executed. So the accesses to ZIP and
JAR resources can do:
public class JarFileSystemTest extends FileSystemFactoryHid {
public static Test suite() {
NbTestSuite suite = new NbTestSuite();
suite.addTestSuite(RepositoryTestHid.class);
suite.addTestSuite(FileSystemTestHid.class);
suite.addTestSuite(FileObjectTestHid.class);
suite.addTestSuite(URLMapperTestHidden.class);
suite.addTestSuite(URLMapperTestInternalHidden.class);
suite.addTestSuite(FileUtilTestHidden.class);
return new JarFileSystemTest(suite);
}
protected void destroyFileSystem (String testName) throws IOException {}
protected FileSystem[] createFileSystem (String testName, String[] resources) throws IOException{
return new JarFileSystem (createJarFile (test, resources));
}
}
Other types of file system plugins do similar things, just create
different instances of the filesystem.
As usually this helps to fight with an application amoeba.
Moreover Test Compatibility Kit helps to improve regular activities
a software engineering organization needs to solve. For example it simplifies
the lifecycle of a bug report, it makes it much easier to find out which part of
the system is buggy and lowers the number of reassignments that a bug needs
to find the right owner. We really lowered the number of assigning to you
for evaluation
bug transfers by introducing TCKs.
In some sense a TCK supports the best
incarnation of programmer's arrogance
- until you have an implementation,
also use the TCK or your bug reports will not be taken seriously.
Testing Complete Rewrite Compatibility
In interesting example when one can use the TCK approach for own good
is case of complete rewrite of some class. Imagine that there are
maintanence problems with one class. Its implementation is not good,
it is buggy and it also need to be improved to handle a bit more. Nobody
believes that it is possible or reasonable to enhance the existing code,
it is better to throw it away and write from scratch. However there is a
concern that by doing the rewrite the functionality of the class might change
and thus affect everyone who is using it. Imagine that the situation
gets even more complicated as the original author of the code just did
not know how tests are important and the class is heavily undertested.
What shall one do?
A possible and highly recommended technique is to create a test that
is going to compare the behaviour of two implementations and very that
they do behave the same with respect to what is being tested. The first
step is to move the old implementation from the application code to the
test code. So instead of being deleted the old unmaintainable class
is going to become the template for the expected behaviour of the new
code.
The testing code is then going to use a facade - an abstraction layer
so instead of calling the methods on the class itself, it is going to
call methods on some abstract interface, just like tests written for
TCK would do. The actual implementation of the interface is then going
to either call the old code or the new code. If the test passes in both
of these setups then it can be believed that the functionality of the
old implementation has been preserved well enough to match the behaviour
of the old implementation to the known extent. Btw. it should be noted
that this is a very suitable situation for using the randomized tests, because
they can just generate a random sequence of calls and apply them to both
of the implementation at once and verify that they both produce the same
results.
NetBeans used this approach for example during an attempt to rewrite
CookieSet to new and enhanced implementation.
The old implementation has been copied to
the test, called OldCookieSetFromFebruary2005 and
the test could compare the behaviour of the actual implementation of the
class from Feb 2005 and confirm or reject that the
new code behaves the same. This turned into very useful verification
as it lead us to postpone the integration of changes into CookieSet
as we were not able to reach the full compatibility easily at all.
Deadlock Test
Fighting with deadlocks is a sad destiny of any multithreaded application.
The problem field has been under extensive research because it causes
huge problems for every writer of an operating system. Most of the applications
are not as complex as operating systems, but as soon as you allow foreign
code to run in your application, you basically have to fight with the
same set of problems.
In spite of the huge research efforts, there was no simple answer /
solution found. We know that there are
four necessary and also sufficient
conditions for a deadlock to be created:
- Mutual exclusion condition - there has to be a resource (lock, execution queue, etc.)
that can be owned by just one thread
- Non-preemptive scheduling condition - it is not possible to
take away or release a resource already assigned by anyone else
than its owner
- Hold and wait condition - a thread can wait for a resource
indefinitely and can hold it indefinitely
- Resources can be acquired incrementally - one can
ask for a new resource (lock, execution queue), while already
holding another one
But we do not know how the code that prevents at least one condition
to appear shall look like and definitely we do not know how to do
a static analysis over source code to check whether a deadlock can or
cannot appear.
The basic and in fact very promising advice for a programmer in a language
with threads and locks like Java has is: do not hold any lock
while calling foreign code. By following this rule one eliminates the
fourth
condition and as all four must be satisfied to create a deadlock, we may
believe we
found the ultimate solution to deadlocks. But in fact, it
is sometimes very hard to satisfy such restriction. Can the following code deadlock?
private HashSet allCreated = new HashSet ();
public synchronized JLabel createLabel () {
JLabel l = new JLabel ();
allCreated.add (l);
return l;
}
It feels safe as the only real call is to HashSet.add and HashSet does not use
synchronized at all. But in fact there is a lot of room for failures. The first
problem is that JLabel extends JComponent and somewhere in its
constructor one acquires the awt tree lock (JComponent.getTreeLock()). And
if someone writes a component that overrides:
public Dimension getPreferredSize () {
JLabel sampleLabel = createLabel ();
return sampleLabel.getPreferredSize ();
}
we are in danger of deadlock as the getPreferredSize is
often called when a component is painted and while the awt tree lock
is held. So even though we tried really hard to not call foreign code, we
did it. The second and even less visible problem is the implementation
of HashSet. It uses Object.hashCode() and Object.equals
which again can call virtually anywhere (any object can override them) and
if the implementation acquires another lock, we can get into similar, but
even less expected, problems.
Talking about possible solutions for deadlocks would provide enough
materials for its own article, so let us return back to topic of this
one - writing tests.
In Java, the solution to deadlocks are often easy. Whenever the application
freezes, the user can produce a thread dump and from that we can get the
description of the problem. From there, there is just a step to fix,
just lock on another lock, or use SwingUtilities.invokeLater
to reschedule the code in question from the dangerous section sometime
later
. We used this style for a few years and the result is that
our code started to be unpredictable and we have not really fixed much
of the deadlocks when we modified the code to fix one, we often created
a new one. My favourite example are changes made in our classes
on
Jun 26, 2000 and
Feb 2, 2004. Both tried to fix a deadlock and the second one
effectively returned the state back, prior to the first integration.
That means we have successfully shifted the amoeba
shape of our application code to fix a deadlock in the year 2000, and
four years later we just shifted it once more to improve in one part,
but regress with respect to the 2000's fix. This would have never
happened if together with the first fix, we also integrated a test!
A test for a deadlock!? Yes, a test for a deadlock. However surprising
that may sound, it is possible and often not that hard (we often write
a test for deadlock in about two hours, we never needed more than a day).
Beyond the automated nature of such test, it also gives the developer
confidence that he really fixed something, which is not usually fully obvious due
to the esoteric
nature of deadlock fixes, as they cannot be usually
reproduced and thus verified by anyone. Also, when there is a test, one can choose a simpler
solution that fixes the problem than to invent intellectually elegant,
but in fact complicated, one. The result is that the art of deadlock
fixing turns into regular engineering work. And we all want our
applications to be developed by engineers, do we not?
Writing a test for a deadlock is not that hard. In our imaginary situation
with createLabel we could do that by writing a component,
overriding getPreferredSize, stopping the thread and waiting
while another one locks the resources in the opposite way:
public class CreateLabelTest extends TestCase {
public void testSimulateTheDeadlock () {
MyComponent c = new MyComponent ();
c.validate ();
}
private static class MyComponent extends JComponent
implements Runnable {
public synchronized Dimension getPreferredSize () {
JLabel sampleLabel = createLabel ();
new Thread (this).start ();
wait (1000);
assertNotNull ("Also can create label", createLabel ());
return sampleLabel.getPreferredSize ();
}
public void run () {
assertNotNull ("We can create the label", createLabel ());
synchronized (this) {
notifyAll ();
}
}
}
The test works with two threads, one creates a component and validates
it, which results in a callback to getPreferredSize
under the awt tree lock, at this moment we
start another thread and wait a while for it to acquire the createLabel
lock. Under the current implementation this blocks in the JLabel constructor
and as soon as our thread continues (after 1000ms) we create the
deadlock.
There can be a lot of fixes, but the simplest one is very likely
to synchronize on the same lock as the JLabel constructor does:
public JLabel createLabel () {
synchronized (JLabel.getTreeLock ()) {
JLabel l = new JLabel ();
allCreated.add (l);
return l;
}
}
The fix is simple - much simpler than the test - but without the test,
we would not fix the shape of our amoeba. So the
time spent writing the test is likely to get paid back.
Often the test can be written by using an already existing api, like in
our case the getPreferredSize method (for example
our
test. Only in special situations does one need to introduce a special method
that helps the test to simulate the problem (we used that in our
howToReproduceDeadlock40766(boolean) called from PositionRef.java
. Anyway deadlock tests are pure regression tests - one
writes them when a bug is reported, nobody is going to write them in
advance. At the beginning it is much wiser to invest in good design, but
as we explained earlier, as there is no really universal theory on how to
prevent deadlocks, one should know what he wants to do when a deadlock
appears, for that we suggest that testing is the best way with respect
to our amoeba shape.
Testing Race Conditions
While certain problems with multiple threads and their synchronization are
hard to anticipate, as deadlocks mentioned earlier, sometimes it is possible
and useful to write a test to verify that various problems with parallel execution
are correctly handled.
We have faced such problem when we were asked to write a
startup lock
for NetBeans. The goal was to solve the
situation when a user starts the NetBeans IDE
for the second time and warn him that another instance of the program is
already running and then exit. This is similar to the behaviour of Mozilla or
Open Office. We decided to allocate a socket server and
create a file in a well known location with the port number written to it.
Then each newly started NetBeans IDE
could verify whether a previously running instance is active or not (by reading
the port number and trying to communicate with it).
The major problem we had to optimize for was to solve situation when the
user starts more NetBeans IDEs at once.
This can happen by extra clicks on the icon on the desktop or by dragging and dropping
more files on the desktop icon of our application. Then more processes are
started and they start to compete for the file and its content. The sequence
of one process looks like this:
if (lockFile.exists ()) {
// read the port number and connect to it
if (alive) {
// exit
return;
}
}
// otherwise try to create the file yourself
lockFile.createNewFile();
DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile));
SocketServer server = new SocketServer();
int p = server.getLocalPort();
os.writeInt(p);
but it can be at any time interrupted
by the system and instead of
executing all of this as an atomic operation, the control can be passed to the
competing process which does the same actions. What happens when one
process creates the file, and another tries to read it meanwhile, before a
port number is written to it? What if there is a file left from a
previous (killed) execution? What happens when a test for file existence
fails, but when trying to create it the file already exists?
All these questions have to be asked when one wants to have really good
confidence in the application code. In order to get the confidence we
wanted, we inserted a lot of check points
into our
implementation of locking
so the code became a modified version of the previous snippet:
enterState(10, block);
if (lockFile.exists ()) {
enterState(11, block);
// read the port number and connect to it
if (alive) {
// exit
return;
}
}
// otherwise try to create the file yourself
enterState(20, block);
lockFile.createNewFile();
DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile));
SocketServer server = new SocketServer();
enterState(21, block);
int p = server.getLocalPort();
enterState(22, block);
os.writeInt(p);
enterState(23, block);
where the enterState method does nothing in real production
environment, but in test it can be instructed to block in a specific
check point. So we can write a test when we start two threads and instruct
one of them to stop at 22
and then let the second one run
and observe how it handles the case when a file already exists, but the
port is not yet written in.
This approach worked pretty well and despite the skeptical opinions we
heard when we tried to solve this problem, we got about 90% of the behaviour
right before we integrated the first version. Yes, there was still more
work to do and bugs to be fixed, but because we had really good automated
tests for the behaviour we really implemented, our amoeba
edge was well stiffened and we had enough confidence that we can fix all
outstanding problems.
Analyzing Random Failures
Those 10% of random failures mentioned in the previous part emerged, as
usual, into more work than just the next 10% of additional tests and a few
fixes. They inspired this whole new part, as dealing with failures
that happen just from time to time and usually on a computer that you do
not own, requires more sophisticated techniques be used for their tracking.
The problem with parallel execution is that there is really not much
help anyone can get if he wants to use it correctly.
The methodology is either weak or missing or
just too concentrated on specific cases, the debuggers are not really ready
to push the debugged applications to their parallel limits
, so in
order to really move somewhere, people resort to the oldest solution - to
println and logging. The old approach is pretty simple - add
log messages into your code, run it a few times, wait until it starts to misbehave
and then try to figure out from the log file what went wrong and fix it.
In the case of automated tests a similar approach can be used. Enhance the
application code and also the tests with logging, and if the test fails,
output all the collected log messages as part of the failure report.
We have achieved this by writing our own implementation of ErrorManager
(which is a NetBeans class used for
logging and error reporting), but one can do this in any test by
using java.util.logging and implementing its
Handler. The implementation has to be registered
at the beginning of the test and has to capture all
logged messages and in the case of failure make them part of
the failure message:
public class MyTest extends NbTestCase {
static {
System.setProperty ("org.openide.util.Lookup", "MyTest$Lkp");
}
public MyTest (String name) {
super (name);
}
protected void runTest () {
ErrManager.messages.clear ();
try {
super.runTest ();
} catch (AssertionFailedError err) {
throw new AssertionFailedError (err.getMessage() + " Log:\n" + ErrManager.messages);
}
}
public void testYourTest() throws Exception {
// invoke some code
ErrorManager.getDefault().log ("Do some logging");
// another code
ErrorManager.getDefault().log ("Yet another logging");
}
public static final class Lkp extends org.openide.util.lookup.AbstractLookup {
private Lkp (org.openide.util.lookup.InstanceContent ic) {
super (ic);
ic.add (new MyErr ());
}
}
private static final class ErrManager extends org.openide.ErrorManager {
public static final StringBuffer messages = new StringBuffer ();
public void log (int severity, String s) {
messages.append (s);
messages.append ('\n');
}
public void notify (int severity, Throwable t) {
messages.append (t.getMessage ());
messages.append ('\n');
}
}
}
The logging can be done by the test to mark important sections of its
progress, but the main advantage is that your code shall be full of
ErrorManager.log or (if you use the standard Java logging)
java.util.logging.Logger.log. The test then collects
messages from all places and in the case of failure, provides a
complete and detailed (well, depending on
how often and useful the log messages are delivered) description
of the failure which then can be analyzed and either fixed or
the logging made more detailed to help to track the problem down
(as we did for example in
ChildrenKeysIssue30907Test.java).
Sometimes people are reluctant to analyze random failures in tests as
something that does not affect the production code. In fact, it may turn
out that the problem is in the test and the code is ok, but relying on
this is usually false hope. Without a deeper understanding of the problem,
it can be in the application code and even if it is not reproducible all the
time, if it occurs it can have huge consequences. If we want to have
enough trust in the behaviour of our application and make its
amoeba shape less amoebic, logging in application code and
logging friendly tests turn out to be a very useful tool.
Advanced usage of Logging
The NetBeans project started to introduce tests that collect logged
output slowly and only in the most exposed places, where the failures
were often due to overly complicated usage of multiple threads. However
one should remember that every application written in Java is
non-deterministic, even if it consciously uses just one thread. There
is always the garbage collector that starts in random moments depending
on conditions external to the tested program and also when dealing with
visual components, there is at least the AWT event dispatch thread, which
can also receive events in not that strictly defined order. As a result
it turned out that the logging support is pretty useful in almost any
test. Because of that NetBeans decided to support it
in its NbJUnit extensions.
The simplest add-on is to support collecting of messages logged during
run of a test. To get this, one can just override the logLevel
method in any subclass of NbTestCase. Let's look at following
example:
public class LogCollectingTest extends NbTestCase {
public LogCollectingTest(String n) { super(n); }
protected Level logLevel() {
return Level.ALL;
}
public void testMethodOne() { /* .... */ }
public void testMethodTwo() { /* .... */ }
public void testMethodThree() { /* .... */ }
}
It says that logging of all message levels shall be collected and as such
if any of the test methods fails, the reported exception is going to
contain output of all messages logged during execution of the program
in format like:
[name.of.the.logger] THREAD: threadName MSG: The message
Of course the text in the exception message is truncated, to be of
reasonable length, but the full log report is also available. In the
NbTestCase.getWorkDir() which would usually be
/tmp/tests/testLogCollectingTest/testMethodOne is a file called
testMethodOne.log which contains last 1Mb of logged output.
That should be enough for cases when one needs deeper analysis of some
complex logging code, as most
of the recently logged information shall be there. It shall also be mentioned that
if one is using xtest to execute
the tests, the generated HTML report includes the copy of the workdir
including the output log file.
Sometimes the logged messages can get pretty long and especially
in case of random failures it may not be completely easy to analyse
what is going on in the test. The best strategy to fight with
such situation we have discoveder so
far, is to add fail("Ok"); line into the place in the
test where the execution randomly fails. This is going to generate
a log file as well and the two log files - the one from real failure -
and the one from the correct run can then be compared and diffed against
each other. Of course it is wise to replace all the @543hac5f
messages in the output with something more neutral, as each run is going
to have these memory locations different. After eliminating this
difference, it is generally possible to get a reasonable diff and understand
what is the difference between the those two runs and what can be the root
cause of the failure.
There maybe two possible reasons why such an understanding to the differences
between log files might not be possible. The first is that there may not
be enough logs. The information recorded in the log file is just to coarse
to provide a meaningful information of what has been happening during the
test run. Indeed, the fix is to insert more logging messages into the
test code. Using the NbTestCase.logLevel method one can
add more logging on various levels like Level.FINE,
Level.FINER and Level.FINEST and in different
test cases enable different test messages. In case of very suspicious
random failure which can potentially influence a critical piece of code,
it may even make sense to log infomarmation about each line of the code
that gets executed, possibly with arguments send to various methods and
state of local variables. That shall generate enough information to
allow detailed understanding of the reasons of random failures.
On the
other hand, there is a hidden catch: The more logging is put into the
application, the more is its behaviour going to be different compared
to the execution of the application without the logging enabled. Every
call to Logger.log is going to add additional delays,
formating of recorded messages is going to take a while, etc., etc.
That is why a failure that might occur in one third of cases, may nearly
disappear, due to all the logging code around. Indeed such a failure
gets harder and harder to repeat and thus evaluate. One can nearly give
up on trying to emulate it in debugger, or in the worse case, one cannot
even face such failure on own computer. However from time to time the
failure appears on someones machine. In NetBeans we have farm of testing
machines running tests every day in various configurations, and this
really helps to track such rare errors down. Althrough we may not even
face the bug on the developer's workstations, from time to time we get
a report from the testing infrastructure with full logging information
and we can start to hunt the bug down.
The other side of the problem is that there is already too much logging
messages and it is hard to get it right from where they come from. For
example if the test executes a series of steps which result in repeated
calls to the same place of the application code that of course prints the
same log messages, one can easy get lost. This is especially hard if the
applications does some computation in a loop repeating the same or at
least similar messages on and on and on. In such case the really helpful
advice is to put logging messages into the testing code itself. Those
messages are the anchor that can lead the base orientation in the log
output. When comparing the differences between the successful and unsuccessful
run, one can first of all locate the test messages and compare the
difference of application messages inserted among them. Basically the
best advice to fight with random failures is to do a lot of logging,
capture it in tests and print enough of logging messages in the application
and also in the test code itself. Then the likehood of understanding
the root cause of the failure increases dramatically.
A bit different direction of logging is testing that something really
has been logged. For example when a new method in one's API that shall
for some reason be overriden by subclasses, it might be desirable to
let already existing classes compiled against previous version of the
class to know about it during runtime:
protected boolean overideMePlease() {
Logger.getLogger("my.logger").warning("subclasses are supposed to override overideMePlease() method!");
// some default
return true;
}
Of course everyone who is dead set on testing wants to be sure that
the warning really get's printed in the right situations. To do that
one can register is own logger or in NetBeans prior to 5.0 its own
ErrorManager (like in the
AsynchronousTest.java)
and makes sure that its warning method is called. However
as this was way too common pattern in NetBeans the there is a special
utility method in NbJUnit that will handle everything itself.
So one can just do:
CharSequence log = Log.enable("my.logger", Level.ALL);
// do the necessary actions
if (log.toString().indexOf("warned") == -1) {
fail("Warning shall be printed: " + log);
}
This is just a little utility method, but it helps to easily
access and analyze what gets logged from inside the test case.
Together with the rest of the support in org.netbeans.junit.NbTestCase
and org.netbeans.junit.Log classes this should
provide good enough support for fighting with the unwanted
shaking of the amoeba's edge using logging.
Execution Flow Control using Logging
This section is going to build on top of the content of
previous description of logging support and on sections
that were discussing testing for race conditions and tests
that try to simulate deadlocks.
Both of these tests need some kind of hooks in the code that allow
the testing environment to simulate some obscure situations and either
corrupt some internal data structure or stop the thread in a place that
is going to cause a deadlock with other executing thread. The previous
parts suggested to put special methods accessible from the testing code
that allow such kinds of execution control. Either to have
code full of inserted statements like enterState(10, block)
or to provide some overridable method like
howToReproduceDeadlock40766(boolean) and make it do something
insane in the test. This is possible, but actually there is a small
improvement which is not easy to find but when discovered it seems
so natural and easy to use, that one just has to wonder why it has
not come to the minds earlier: Instead of putting special methods
in the code, one can use logging!
Logging? Yes, logging! The beauty of such solution is in the fact
that logging is or at least should be natural part of every at least
a bit complicated program anyway,
so one does not have to obscure the code with enterState
and howToReproduceDeadlock40766(boolean), instead one
can just use:
logger.finest("Enter state 20");
// or
logger.finest("reproduceDeadlock40766now");
The first log is fully natural, the second is a bit suspicious because
of the name of its message, but still it is not that much strange to be seen
as alien piece of code not related to the program at all.
Logging just belongs to programs.
Now the testing code can register its own Handler and in
its publish method do all the wild things:
class Reproduce40766 extends Handler {
public void publish(LogRecord rec) {
if ("reproduceDeadlock40766now".equals(rec.getMessage())) {
// block and let other thread run as shown
// in Deadlock40766Test.howToReproduceDeadlock40766(boolean)
}
}
}
Logger.getLogger("").addHandler(new Reproduce40766());
So instead of inseminating the application code with special hacks, one
can achieve the same by adding a logging handler and analyzing messages
passed to it
during execution of the test. The application code stays clean and
the test gets as powerful as is the number of messages that are logged in the
application code. Because each logged message is a chance for the
test to influence the behaviour of the application.
Taken to an extreme, it shall be possible to fully control the
behaviour a multithreaded program by suspeding all other threads than
the actual one that shall be executed. Imagine following program:
class Parael implements Runnable {
public void run() {
Random r = new Random();
for (int i = 0; i < 10; i++) {
try {
Thread.sleep(r.nextInt(100));
} catch (InterruptedException ex) {}
Logger.global.log(Level.WARNING, "cnt: {0}", new Integer(i));
}
}
public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(new Parael(), "1st");
Thread t2 = new Thread(new Parael(), "2nd");
t1.start(); t2.start();
t1.join(); t2.join();
}
}
The program runs two threads, each of them counting by one to ten
while pausing random number of miliseconds.
Obviously the threads are going to be run in parael and the speed of
their counting is going to be random. This can be easily verified by a simple
NbTestCase with enabled logging:
public class ParaelTest extends NbTestCase {
public ParaelTest(String testName) { super(testName); }
protected Level logLevel() { return Level.WARNING; }
public void testMain() throws Exception {
Parael.main(null);
fail("Ok, just print logged messages");
}
}
When executed once, the output can be for example like the one
shown in the left, next time it can look like
the one shown on right:
[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 2nd MSG: cnt: 9
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9
|
[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 2nd MSG: cnt: 9
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9
|
Of course, when the program is executed again and again the
output is unlikely to look exactly same as the previous one.
That is fine, when two threads run in parael, the result is
obviusly non-deterministic. But imagine that one of the execution
orders is somehow special
. For example that it is known
to cause some race condition or deadlock. Indeed such wrong behaviour
should be fixed, but aligned with the intention of this whole paper,
it shall also be simulated, otherwise the fight with amoeba is
never going to end. That is why one should try to write
a test that simulates the order of execution that is known to be broken.
For example let us
try to generate higly unlikely sequence where each thread increments and
prints one number, goes to sleep and lets the other thread run. It is
highly unprobable that such kind of output happened randomly, so
it is reasonable to question whether such test can be writen at all.
But it can! With a little help from logging, here
is the code that forces the threads to
behave deterministicaly and always output just one number from the
sequence:
public class ParaelTest extends NbTestCase {
public ParaelTest(String testName) { super(testName); }
protected Level logLevel() { return Level.WARNING; }
public void testMain() throws Exception {
Logger.global.addHandler(new BlockingHandler());
Parael.main(null);
fail("Ok, just print the logged output");
}
private static final class BlockingHandler extends Handler {
boolean runSecond;
public synchronized void publish(LogRecord record) {
if (!record.getMessage().startsWith("cnt")) return;
if (runSecond == Thread.currentThread().getName().equals("2nd")) {
notify();
runSecond = !runSecond;
}
try {
wait(500);
} catch (InterruptedException ex) {}
}
public void flush() {}
public void close() {}
}
}
When the test is executed it can really be seen that each thread
adds one to its counter and gives execution control to the other thread. And
this behaves more or less deterministicly - e.g. nearly every execution
of the test yields:
[global] THREAD: 1st MSG: cnt: 0
[global] THREAD: 2nd MSG: cnt: 0
[global] THREAD: 1st MSG: cnt: 1
[global] THREAD: 2nd MSG: cnt: 1
[global] THREAD: 1st MSG: cnt: 2
[global] THREAD: 2nd MSG: cnt: 2
[global] THREAD: 1st MSG: cnt: 3
[global] THREAD: 2nd MSG: cnt: 3
[global] THREAD: 1st MSG: cnt: 4
[global] THREAD: 2nd MSG: cnt: 4
[global] THREAD: 1st MSG: cnt: 5
[global] THREAD: 2nd MSG: cnt: 5
[global] THREAD: 1st MSG: cnt: 6
[global] THREAD: 2nd MSG: cnt: 6
[global] THREAD: 1st MSG: cnt: 7
[global] THREAD: 2nd MSG: cnt: 7
[global] THREAD: 1st MSG: cnt: 8
[global] THREAD: 2nd MSG: cnt: 8
[global] THREAD: 1st MSG: cnt: 9
[global] THREAD: 2nd MSG: cnt: 9
The basic trick that allows such miracle to happen is intercepting of the
output messages. The special BlockingHandler registered by
the test before the application code gets started analyzes the messages
sent to it by the application threads and suspends and resumes them
to simulate the execution order where each thread adds one number
and then gets suspended by the operating system in favor of the other thread.
The beauty of such solution is that the application code does not really
know that its execution flow is going to be controled by the test. If
one looked at the application code itself, one could not guess that there
is a test that does such wild things to the code, because the actual
application code looks natural. Just due to the possibility to intercept
logging the test gets the chance to influence the execution flow and
as a result simulate even the wildest and unprobable execution order.
Indeed writing the BlockingHandler correctly may not be
easy, especially if there is more than two threads that need to interact
and if the messages that needs to be analyzed are not that simple. That
is why the NetBeans JUnit extensions library contains a support method
called Log.controlFlow which registers the handler itself
and does all the thread manipulation itself. The only thing that is needed
is to specify the expected order of messages. A very nice thing is that
the format of expected messages is THREAD: name MSG: message
which exactly matches the output reported by the NbTestCase
when capturing of log messages is enabled (by overriding the logLevel method).
So one can just copy the output
and feed it to the controlFlow method possibly without any
modifications. However as the real world messages sent to logger often
contain some content specific to each run (like @af52h442 that
identify location of the object in memory) it is possible to use regular
expressions to describe the expected messages. So here is one possible
rewrite of our test that
uses the Log.controlFlow method to simulate the
one by one
order of execution:
public class ParaelTest extends NbTestCase {
public ParaelTest(String testName) { super(testName); }
protected Level logLevel() { return Level.WARNING; }
public void testMain() throws Exception {
org.netbeans.junit.Log.controlFlow(Logger.global, null,
"THREAD: 1st MSG: cnt: 0" +
"THREAD: 2nd MSG: .*0" +
"THREAD: 1st MSG: ...: 1" +
"THREAD: 2nd MSG: cnt: 1" +
"THREAD: 1st MSG: cnt: 2" +
"THREAD: 2nd MSG: cnt: 2" +
"THREAD: 1st MSG: cnt: 3" +
"THREAD: 2nd MSG: cnt: 3" +
"THREAD: 1st MSG: cnt: 4" +
"THREAD: 2nd MSG: cnt: 4" +
"THREAD: 1st MSG: cnt: 5" +
"THREAD: 2nd MSG: cnt: 5" +
"THREAD: 1st MSG: cnt: 6" +
"THREAD: 2nd MSG: cnt: 6" +
"THREAD: 1st MSG: cnt: 7" +
"THREAD: 2nd MSG: cnt: 7" +
"THREAD: 1st MSG: cnt: 8" +
"THREAD: 2nd MSG: cnt: 8" +
"THREAD: 1st MSG: cnt: 9" +
"THREAD: 2nd MSG: cnt: 9",
500
);
Parael.main(null);
fail("Ok, just print the logged output");
}
}
Indeed this may not look like a big simplification, but usually the important
order of execution just affects some local place and then the size of the script
to replay the execution flow can be simplified much very. For example imagine that
we just want the thread 1st to print 5 first, then thread 2nd
print 2
and then the thread 1st continue with 6. Here would be the script:
public class ParaelTest extends NbTestCase {
public ParaelTest(String testName) { super(testName); }
protected Level logLevel() { return Level.WARNING; }
public void testMain() throws Exception {
org.netbeans.junit.Log.controlFlow(Logger.global, null,
"THREAD: 1st MS