Datasystems problems in detail
The central classes in the Datasystems API contain some of the
hardest-to-understand and perhaps buggiest code in the NetBeans
Platform. It is rather old and the basics have not changed much in the
past several years, but small workarounds and modifications have
accreted since then and are hard to separate from the basic
functionality. This complexity adversely impacts reliability, and
some very desirable performance optimizations are too hard to do in
the current code. Even the developers assigned to maintain the API
are unwilling to make many changes for fear of introducing
regressions.
Datasystems uses a complex threading model which is not really
understood. While folder
recognition is basically single-threaded, modules can start or
intercept the process from any thread, meaning that fine-grained
locking is necessary. Some modules, and even the
core window system, are unable to guarantee that certain
operations will always work, though they succeed almost all of the
time. Other problems
can cause data corruption and deadlocks.
The API is also difficult to understand for a programmer getting
his or her feet wet in NetBeans. There are too many options, many
of which have been unused for years. Naming conventions are not
always logical. Numerous assumptions are not documented
anywhere.
The clustering of files into one data object has made problems for version
control integration which need to be thought out carefully. Coupling visualisation with data layer itself is evil.
Finally, there are high-level architectural concerns over
Datasystems' place. The current API directly depends on a wide
variety of other code, making it difficult to test in isolation.
Some of the more advanced features are unwanted overhead in
applications based on the NetBeans platform, and this may even be
true of a couple of features in the NetBeans IDE, if the new
Projects system is implemented.
Settings system problems
Although settings could look like separate area to solve it is not the case.
Most aspects of settings currently use Datasystems internally, which is an
unwanted dependency. This whole system was never designed as a whole -
individual aspects of it were accreted during each historical
NetBeans release cycle, subject to compatibility and resource
constraints. The lack of Datasystems independent Settings APIs needs to be addressed as part of the Datasystems redesign.
The settings system itself has armful of problems. One problem with settings is coding complexity - few beginning NetBeans
programmers really figure out how all of these things work
together. While most application frameworks would permit you to
write a few lines of configuration to a
*.ini file or
the like, there is nothing comparably straightforward in NetBeans.
Even if you buckle down and copy the boilerplate from a sample
module, making your own customizations, any kind of minor error
will lead to runtime failures that are very hard to diagnose
without expertise in NetBeans internals.
Furthermore, all of the existing coding styles require at least
some Java classes to be written for each distinct kind of setting.
While this is sometimes appropriate, it can also be overkill, and
just serves to increase JVM memory consumption. Registration of
options and services also ties into the Lookup subsystem
in such a way that care must be taken to avoid settings being
loaded before anyone asks for them - all such objects are placed in
one global area although in practice they are only needed in
isolation. There is some inherent overhead in how settings are
stored, some of which has been optimized away in NetBeans 3.5 at
the expense of added internal complexity.
Project-specific settings are possible using one of several
semi-documented tricks. It is likely that the NetBeans 4.0 Projects
infrastructure will not use the current system of making settings
project-specific.
Use of Java serialization for options and services is also a problem in the
current system - serialization does not work well in practice for long-term
persistence of data.
Concrete known Datasystems issues
- During a recognition of a FileObject all loaders are asked to recognize it, more structured hierarchy needed to prevent
all loaders to be initialized in memory until needed and also to minimize the amount of client code that is called
- There has to be clear division between recognition of (mime) type of a file and creation of DataObject
- It is necessary to allow other modules to declaratively extend capabilities of foreign loaders (add cookies to them)
(Issue 20191)
- It is necessary to allow other modules to declaratively add actions to foreign loaders
- Loaders are still serialized, they should be handled by the settings infrastructure
- Package
org.openide.loaders is full of garbage (API, SPI, support, deprecations) and should be redesigned
DataLoader should no longer extend SharedClassObject
- The SPI should be separated and simplified (no need to subclass both
MultiFileLoader and MultiDataObject)
- The Datasystems should be separated from the above layers, the
Nodes API and any windowing framework.
The Datasystems API can be used standalone without the presentation
layer. The Node.Cookie has to be replaced and because Lookup is becoming more standard in variety of APIs
(Looks, Actions, etc.)
shall be reused in loaders package too.
- The problem with recognizing DataObjects - (see hack in DataFolder.handleMove) -
(Issue 8705)
- The problem with 500ms timeout for recognizing DataObjects -
(Issue 20022)
- There is a implementation and intefaces hiearchy mixed in the
loaders package. Nearly every implementation
of DataObject is subclass of MultiDataObject, but there are two public classes DataShadow
and (used to be) DataFolder that subclass directly the DataObject. As such it is very inconvenient
to create own subclass of those objects, because lack of support in MultiDataObject.
- Recognition of list of templates shall not initialize loaders from all modules, otherwise we will not be able to improve
performance.
- The Filesystems API supports direct implementation of move operation. But the current semantic of
DataObject.move and especially DataFolder.move cannot use this
implementation if they want to honor modification of content of the file during the
operation. See
dev@openide for details.
- When a module wants to create a data object consisting of more than one
file it can start writing the individual files to the disk (to the
filesystem layer). When there are only partial set of files another
loader can grab any of the files. There should be a way how to
temporarily stop recognition and let the module finish its work (and
than start the recognition process).
- After a data object/loader recongizes that a set of files belonging to
the data object has changed it should fire to notify modules about such a change.