Clustering the NetBeans Family

$Revision: 1.2 $
Changes: available in CVS
Status:

Abstract:

Since its introduction in NetBeans 4.0 the clustering concept (as outlined in the installation proposal) has won hearts of every engineer working on NetBeans or on applications build on top of NetBeans. The concept of putting a set of modules into a dedicated directory and call it cluster seems to be natural, obvious and self-explaining. This definitely contributed to the easiness with which the cluster concept has been adopted. However there is a hidden catch: clusters may not be what people think they are. As a lot of things in the software, it is a djinn that can serve right, but only if it is used the correct way. This document is going to describe the clustering situation as of 5.0 release of NetBeans IDE, summarize the problems that various groups would like to address and outline a solution that addresses most of them. While doing the analysis, the document will constantly remind and explain the purpose of clusters and the way in which they are intended to be used.


Problem

The ideX cluster which is distributed as part of NetBeans IDE 5.0 is too big.

Complaints

  1. We are building a non-java language support and our customers are unhappy that they need to download and have installed a cluster which contains a lot of things only related to java. We'd like to have a smaller cluster which we could install and use, which would be completely java independent, but would provide support for usual things an IDE needs - projects, version control, debugger.

  2. I think the API support should be separated from the IDE to its own cluster as it is mostly independent - no other part of the system module depends on the api support modules - and also it is being developed under different schedule than the rest of the ideX cluster - at least we already had an update1 release for 5.0 while the rest of the ideX remained unchanged. Things would just be simpler if we had own cluster.

  3. I am trying to get a NetBeans Platform based product into some Linux distribution. It drives me crazy to see all the binary code license libraries inside ideX. I mean, it is fine that NetBeans own code is open source, but most of the distributions do insist on having full open source stack and inclusion of non-open source code is not really an option. I somehow handled the javahelp in platform, however it would help me a lot if we could split the ideX into meaningful parts that would be clear of proprietary libraries. That way opensource developers could easily use at least something from the NetBeans Platform.

  4. We are working on support for BlueJ which is an IDE for teaching the basics about Java and OOP in general. We need to provide a modified NetBeans IDE environment with a simplified UI. We currently do this by including whole ideX cluster and disabling the modules we do not need in our cluster, but for the sake of smaller download and install size it would be better if there was a smaller cluster than ideX containing what we need. Moreover what if BlueJ depends on part of a module? Ok, let's allow masking of elements in layers, but there shall not be too many hidden files. This does not work well with new release as new elements can stand out and need to remasked once again.

  5. I've been working hard on making a lot of modules webstartable in NetBeans 5.0, but I've stopped at the level of ant and jpda debugger modules, because they were too complex for default JNLP convertor. So in fact the ide6 cluster in 5.0 is broken into two parts - those modules that are webstartable and those that are not. Indeed we can increase the amount of webstartable modules, but for next version it would also be good if we could claim that some clusters are webstartable and then this would apply to all modules in the cluster

  6. Once over a while people decide to need a smart installer. I mean an installer that can install anything based on NetBeans, shows the user selection dialog first and after making the choice installs and possibly also downloads the needed components. As a result users get a system fully tailored to their needs with just few clicks. It would be nice if any kind of changes to cluster system reflected this goal and would support or at least not block implementation of such installer.

  7. Sometimes, it's convenient for me to edit web pages--pure HTML--in NetBeans simply because I already happen to have it open when I need to tweak a webpage. Of course, there's very little in the way of project maintenance for websites, but then, I don't expect there to be because that's not really what NetBeans is for. But... it could be. It could even be useful to someone who /only/ wanted to edit webpages, and wouldn't know a Java class if it were eating his spleen.

  8. I want complete freedom for everyone! I want to compose everything with anything, without dependencies, for example I might want half of the html module.

  9. I have a feeling that cluster is something people understand. They can understand that clusters seem to have some api. Clusters also help highlevel descriptions of the architecture of the system. Using just a single modules for this is too low level and hard to grasp. On the other hand I have a feeling that marketing and others are mixing clusters and feature packs. This can create dangerous confusion and shall be cleared.

  10. We need to get friend dependencies between modules inside of one cluster under control. Because if one masks a module it may not be possible next release due to changed dependencies.

  11. Where do db modules go? New cluster? I'd bet that native tools do not want database.

  12. XML: Is it own area of functiohnality? Is there a basic XML for ant module, validation, and some more advanced XML. Surprisingly it also depends on Java - probably to generate SAX parser. This shall be cleared, this dependency shall be weak.

  13. I really like shallow dependencies between clusters. Best of all would be if the clusters do not depend on each other like profiler and mobility, etc.

Criticals

This paragraph is going to give the complaints from previous section structure and form. It is also going to assign to each of its items a priority (high - 9, medium - 3, low - 1), so any created solution can be evaluated to match these critical aspects.

Id Description of a Critical Aspect of the Problem Priority
nojava There has to be a cluster which contains project related APIs (project api, (project UI api) and another cluster which is going to contain java related modules like JavaModel and java. These two clusters has to be distinct. Due to the nature of Java support the second one has to depend on modules from the first one. The oposite dependency has to be prohibited. High
dbgcore There is a need for debugger core and its UI API to be in some other cluster than the Java one. Such cluster cannot depend on the one containing the Java support. High
apisupport Support for development of NBM projects is an independent part and no other cluster or its module shall depend on it. To prevent such accidental dependency, the apisupport module(s) shall be in a special cluster, which contains no other APIs of interest to enterprise or native or v-builder modules. Medium
userneed Make sure that the size of clusters is going to exactly match user needs. End users are going to appreciate if the size of the clusters will match their need. The primary concern here is the size of the download, they want to be sure that the download just what they wish and what they will really use. Low
no3rd Create a cluster which will contain those useful NetBeans utilities, like ant, version control, editor, etc. but no jmi.jar, mof.jar, java-parser.jar, jaxrpc and gjast.jar. Such a set of modules will not be enough to implement full featured Java IDE, but is still going to be useful for other projects like C/C++/Fortran, Latex, Html Editor, etc. Medium
grain Provide a solution that will allow more fine grained dependencies than just on the level of individual modules. Low
treedep Arange clusters into a non-linear ordering, so we minimize the amount of interdependencies and isolate some of them completely, like in the case of mobility and profiler clusters. Medium
check Create a supporting infrastructure that will allow intra-cluster dependency checks and control. In case one wants two or more logical units of modules inside of one cluster. Medium

The Low priority goal about seperation of the clusters to match the end users needs is low because clusters are compatibility units, if they work for something else, good, if they can be modified to work better for something else, good as well, but we cannot change clusters every release according to the current market expectations or needs - e.g. target end user groups. That is why there should be other means of achieving the "right size" for end users. Marketing guys shall be informed that cluster is not what they want to think about, that they do want "feature packs" and co.

The other Low goal is about finer dependencies than just on module level. It is a nice wish, but not only it is out of scope from point of view of clustering, but also it is pretty impractical. We have troubles to control dependencies among modules, and checking deps among individual classes would not make the work simpler. The correct solution is to work with the owner of the module to split it into necessary parts.

Technical Analysis

  1. Reasons to Separate
  2. Relation between kinds of separation

Reasons to Separate

There is more than one reason why we want to separate the whole product into smaller pieces and each of them has its own preferences on the size and amount of separated units. It is usually hard to try to satisfy more than one requirement by one solution. It can work for a while, but it is for sure that a conflict will occur after a while. That is why we try to present all known reasons for separation and then discuss techniques to satisfy them and also their mutual relations.
  • runtime separation - the goal is to separate the product into pieces that will represents components in the system with limited dependencies, separate address space (classloaders) that can be independently enabled and disabled either by hand or by some automatic criteria (syntax coloring for JSP enabled when JSP enabled and editor enabled).

  • compatibility separation - the goal is to define compatibility domains that will be used to compose different products and will be able to co-exist in multiple versions as discussed in versioning section of installation document.

  • user separation - the primary goal is to present the end user structure of the product offering download (just download one file for given piece of functionality), turn on or off of some pieces of functionality (show just list of the functionality categories).

    While the runtime separation requirements and compatibility separation were discussed in their own document the requirements of a user point of view have not yet been summarized. Everything up to now is hidden under the vague goal to have the right size functional packs. That is not too much. That is why this part of the document tries to define actual user requirements:

    1. Functional packs shall be composed from one or more modules - the implementation details (aka individual modules) are currently usually too small and reflect more design goals than something that shall be presented to regular user as a set of functionality.

    2. One file per pack - there should be a way for an offline installation to download the functional pack as one file (zip, rpm, etc.).

    3. Disable/Enable a pack - user should be able to disable the pack as a whole. Then a support for a certain functionality disappears from his eyes (menu, toolbars, components, workspaces) while the rest of the system continues to work correctly.

    4. Uninstall a pack - there should be a way how to remove a pack completely, i.e. from a disk. Then it won't be listed in a set of installed packs. Of course in case of packs installed in a shared location this may work only for system administrators.

    5. Restore a pack - there should be support for ability to backup the pack when installing a newer version. User should be able to update to a new version and if not satisfied, then restore to previous one.

    6. Extensibility - the functional packs should be extensible by multiple parties. Example: one group provides XML support (editing, validating, executing XSLT) and another group (working of completely different product but based on work of previous group) does XSLT debugging. They would like the user to see just one pack - XML Support - and contain functionality provided by both groups.

    It is important to note that it will probably be hard to satisfy all these requirements at once. Each solution presented bellow supports some of them, but none supports all.

Relation between kinds of separation

  • Runtime vs. User - The initial and easy observation is the relation between runtime and user separation. As module is the atomic piece of functionality, the basic building block it is natural to expect users to operate on chunks larger than it. That is why the set of user visible functionality will be equal or greater to one module.
  • User vs. Compatibility - It is not clear what is the relation between the size of user functional packs and compatiblity units. If we were to satisfy one of the user level requirements - to be able to download the set functionality as one file (Solaris package, RPM or NBM) - we would have to keep the functional packs always smaller than compatibility ones because of the restrictions in conclusions of installation proposal saying that a compatibility cluster can be composed from a content of one or more RPMs. But if we drop this requirement this relation between user and compatibility separation could be droped as well. That means there is no real restriction on the relation between separation for user and separation for compatibility.
  • Runtime vs. Compatibility - Obviously the runtime unit - e.g. module is the smallest building piece in NetBeans architecture constraints. Regardless of what we do, we always end up with upgrading JAR files - e.g. modules. That is why the compatiblity unit has to be at least of size of a module. However experience has shown that certain modules are more closer than others and as such they include private mutual contracts. It does not make sense to try to stabilize such contracts, as that could be very huge amont of work. For such situations it, on the other hand, makes sence to make the compatibility unit larger. By doing that the unit is guaranteed to be build, maintained and distributed at one and correct cooperation of its individual pieces can then be guaranteed. That is why the compatibility separation is usually larger than the runtime one.

Proposed Split of ideX cluster for NetBeans 6.0

To satisfy most of the requirements stated above, it seems sufficient to split the ideX cluster into three parts. One for non-java support, one for java support and the rest of apisupport. This split seems to satisfy most of the "criticals" mentioned above and as such it seems pretty sufficient. To get a visual grip on the split, please invoke the following applet:

The applet's first matrix shows four groups of modules, each in its own color. As can be seen the modules are grouped by its expected clusters. There is a platform at the bottom, ide above it, followed by java support and then apisupport. As can be noticed, there are certain dependency problems, as some of the java modules are inserted in middle of the ide group. These problems need to be solved resolved prior to split of ideX cluster as our build system does not allow cyclic dependencies:

  • lib/jsch - depends on Ant module. Needs to be removed, obvious fix is to split the module into two parts, but probably a better one is to let this module do some kind of declarative registration (maybe library for Ant?) and let some other module find such declaration. This would on one hand create a soft dependency, on the other it would keep the declaration next to the library and could be useful in other places as well. Fixed[77489]
  • xml/tools - provide support for generating skeleton for SAX parser, etc. this indeed needs to be part of the java cluster. The solution is to split the module into two parts, make the one with the tool eager depending on the xml/tools module and java infrastructure. [77491]
  • apisupport/project - has implementation dependency on xml/tax. Need to change to friend, at least. [77492]
  • httpserver - btw. the httpserver contains some old and obsolete copy of tomcat. I guess tomcat 3. Independent, but should be fixed, if possible the httpserver could be moved to enterprise cluster where guys could integrate well with never version of tomcat. The problem however is that currently XSL module is depending on httpserver and as such it would be neccessary to split the module into api and impl, probably. [77493]

The visualization shown in the applet is based on artificial commit verification files generated by special build script which defines the actual groups of modules by use of Ant's selectors. This build script generates golden files into a directory which is then read by the applet. The most important files there include the list of modules and clusters they belong to and also the list of dependencies between individual modules.

Why this split is good?

This split is good as it satisfies most of the critical. Let's see. Let's assign value 9 to every critical which is High, 3 to those that are Medium and 1 to all Low. If the proposed solution solves the problem stated by the critical, we get the points, otherwise we get negative amount of points. Here is a table evaluating the state:

Id Comment Priority
nojava The cluster is there. 9
dbgcore Debugger core is in the java independent cluster. 9
apisupport API support is in its own cluster. 3
userneed Not solved. -1
no3rd The new ide cluster does contain no non-opensource 3rd party libraries. 3
grain No. -1
treedep Solved, the enterprise cluster does not need to depend on apisupport one. 3
check As demonstrated by the build script that generates the sample ideX split, it is possible to create golden files for any set of modules, even when they are part of one cluster. 3
Total 28

As can be seen from the total sum, the current proposal gets 28 points from 32 possible and that means it does not satisfy just low priority criticals. That is the reason why it is believed that this proposal is good enough to be implemented.

Clusterization of NetBeans IDE

The whole installation document talks about imaginary cluster examples like nbplatform, nbjava, nbxml, nbnative and nbweb. They are partially based on reality but in order to define the actual clusters for future releases of NetBeans project more careful analysis is needed.

From a set of currently available modules we generate set of golden files describing dependencies and other important global characteristics. See CVS for text format of those files or visualize the current dependencies by following applet applet:

Install java plugin first!

The applet provides interactive layout and visualization of modules and their dependencies. There are two views. The dynamic one presents each module as represented a colored box. Modules in the same cluster share the same color. Dependencies between modules are draw as lines between modules and cause the modules to be attracted together. One can drag and drop module boxes to rearange the layout and make it more reasoanable. The matrix one shows individual modules as lines, sorted by default according to their dependencies and clusters they belong to, and allows various inspections and manipulations on top of the data. Both views are connected and changes made in one, shall be visible in the other as well.

Warning: Reading the rest of document may be interesting but in fact it is OUT OF DATE so you'd better stop here
The herein described clusterization leads to few interesting observations. The natural expectation would be to have all or nothing clusters. Each product either takes whole cluster or does not include it at all and the whole cluster is either enabled on or not. That would as well mean that there is some kind of functional dependency between clusters. That would be wrong expectation.

Functional dependencies between different parts should be expressed on a level of individial modules (Java Debugger depends on Java Support and on Debugger Core), but is itself included in a Debugger cluster which may work in a product without Java support (aka C/C++ studio). This is fine, as the clusters main goal is to group chunks of related functionality that will be developed and maintained as a whole and will be kept compatible as a whole. For that reason there may be mutual interdependencies between clusters - module in one cluster can depend on module in another one while there is another pair of modules with reverted dependency. This is fine and probably unavoidable, the only necessary thing is to remember the transitivity problem and avoid it.

Solution A: RPM as functional pack

Let's investigate what implications making functional packs exactly match native packages (RPM, solaris packages) would have. A sample clusterization of java related functionality is shown on the picture on right. Each box represent on runtime module and arrows shown linking dependencies between them. The whole set of modules forms a compatibility cluster because it is expected that all of these are mutually connected and will share the same compatiblity cycles. The red, green, yellow and bluecircles show possible RPM separation (basic java, visual java, enterprise java and internationalization support).

Please note that based on discussion in installation proposal document it is not possible for an RPM package to span over multiple (compatibility) clusters. E.g. RPM containing Java cannot include part of XML support. This immediatelly implies that we cannot satisfy the extensibility requirement. On the other hand it is easy to package one functional pack into one downloadable file as RPMs are already archives.

Also from the nature of RPMs, it seems that they should more likely be managed by native operating system management tool (not sure what that should be on Windows) by system administrator. That delegates the responsibility of (un)installation and restoring to the operating system support. Probably restorable updates could be done via solaris patches, which keep previous version and thus support restore. Enabling/Disabling of some functionality would probably need to be supported by NetBeans directly even it is not clear whether it would not be better to abadon it completely in favor of (operating system supported) uninstallation.

This kind of solution seems to reuse a lot of implementation from operating system and thus save us a lot of work, based on assumption that there will be some kind of packaging system available for Windows. It nearly removes the need to have AutoUpdate support in the product at all. On the other hand it limits the possibilities choosing functional pack (cannot be bigger than compatiblity cluster) and does not cope well with temporary disabling of some pack.

Solution B: Module based approach

The basic motto is taken from a mailing conversation on nbui@netbeans.org in Spring of 2002: The module system and UI permit the user to make autoload or eager modules into regular modules, and perhaps also to make regular modules into autoload or eager modules. Almost all modules will be autoloads. All modules that are autoloads or eager modules now, will stay that way. All or most modules that are regular modules now will become autoloads. What will functional pack then be? Regular module! Usually it will be completely empty module providing just a name, description and set of dependencies, listing everything which should be "in" the functional pack. However modules that it makes no sense to join with others and look reasonably by themselves, they can stay that way.

The UI shall just show only regular modules (e.g. functional packs) and hide autoloads and eager ones as they will be the implementation detail. There should probably be an expert mode allowing manipulation of the (usually invisible) modules allowing power users to turn autoload modules into real ones and completely configure their system.

The beauty of this system is that it is completely orthogonal to the compatibility clusters and gives absolute configuration power not only to skilled users, but also for product assemblers as they can preconfigure their own regular modules in groups they like and change the status of others provided to them by other groups, which was not possible in previous proposal.

This proposal easily satisfies requirements of multiple files in a pack (by declaring dependencies on them), it has no problems with enabling/disabling such packs. It however does not simply satisfy one downloadable file as the actual module JARs are packages in various RPMs (remember the orthogonality) and only thing one could do is to ZIP all necessary RPMs that form the functional pack and let user unzip them (but of course they would very likely contain a lot of other unrelated stuff), so this is not really good solution. The uninstallation and restoring of a feature would be probably better to leave to operating system and its administrators. Only they could uninstall or restore RPMs with certain functionalities using native tools.

Extensibility is where this proposal excels. Not only it gives absolute freedom to product assemblers to create functional packs of their own, but also allows third parties to extend such a defined packs. For example: if there is a module defining XMLSupport (which depends on xml/tree, xml/edit, etc. autoloads) one can define an eager module depending on the XMLSupport and enabling the third party functinal modules. That way whenever a user enables XMLSupport, not only things defined originally but also any other extension willing so, will be enabled. This seems like very desirable behaviour for a dynamic and modular system like NetBeans.

Verification Framework

When modules get separated into clusters or if the dependencies between them get cleared, we need to make sure that they really stay separated. We need to know what is happening with respect to module dependencies. We need to be notified how they change, evolve and if there is a step in an unwanted direction, we need to take immediate action are resolve the situation. For that we need a verification framework.

Goals

A set of goals without any assigned priority:
  • [lessimpldep] We need to minimize the amount of implementation dependencies - they are hard to increase after a release (because they are always done in some specific way)
  • [noimpldepbetweenclusters] We need to prevent implementation dependencies between clusters - this limits the amount of combinations we can do with clusters and as they are supposed to be compiled separately, we cannot have such restrictions
  • [nocyclesbetweenclusters] We need to prevent cyclic dependencies between clusters - otherwise we will not know how to compile them (module system can handle this, but the build system does not and probably should not)
  • [comparegraphs] We need to compare two dependency graphs together - otherwise we will not find out that something has changed
  • [fileslayout] We need to be informed about changes in files layout - the change of layout is usually sign of some bigger changes somewhere else and is one of the most useful inputs for any review
  • [dailytest] We need to run the comparing test dailly - probably as part of daily build and we need some kind of notification that it failed
  • [personaltest] We need to run the test as part of every developer build - this can catch unwanted changes early, commiters cannot complain that they did not know
  • [allowchanges] We need to allow changes of module dependecies - of course certain changes are sometimes acceptable, and there should be a way how to enabled them
  • [education] We all need to understand that dependecies are important - the knowledge of the importance of dependecies between modules does not seem to be too big and this restricts usability of our product (html editing requires mdr), we need to help everyone to understand how important this is
  • [simpleincrement] We need to simplify update of dependencies - currently the dependencies take significant amount of time to be updated when a new release branch is created. This is due to non-standardized way implentation dependencies are used and also due to non-trivial way how to increase certain digit in specification version of each module in release.
  • [listpublicpackages] We need to check list of public-packages - if we decide to eliminate implementation dependencies, we will need to replace them by something, that something means real API and this means to leads to opening certain module packages and declare them as public. This shall not get out of control as it is an API and needs careful review.
  • [publicpackagesapis] We need to check api changes in public-packages - right now we do sigtests just on official namespaces, but apis in public packages are important as well (not that important, but important). We especially need to prevent accidental changes there.
  • [genereatedocs] Architecture documentation needs to be in sync with the reality. Documentation is an important part of API and it would help a lot to automate generation of module dependencies information into arch documents. Otherwise we are reviewing what people tell us they did, not what they have really implemented.

Proposed Solution

The general idea is to implement a special ant task in the nbbuild CVS module that will be configurable using its properties and will generate output that can then be compared using regular text tools to find out whether or not there are some differences.

The input of the task will be a set of module JAR files or possibly also list of projects from where the task will read its dependency informations. The list of JAR files is useful for checking the dependencies on the produced build and will have to be implemented since begining, reading of the project.xml can be useful for genereatedocs goal, but as that is not the highest priority, it can wait for a while.

The task will be configurable to produce various output formats and write them into given output file. The possible outputs will include list of all public packages (useful for listpublicpackages and publicpackagesapis, if the list is then fed to a signature processing tool), list of all modules and their versions (this will notify us that there is a new module added), list of all modules and their dependencies including implementation ones (it will not help directly to have lessimpldep but at least the amount will not grow, and also will notify us about changes in dependencies).

The task will have to understand clusters in order to check for cluster dependencies noimpldepbetweenclusters and prevent nocyclesbetweenclusters. Probably the task shall be able to output dependencies on the level of clusters and not only modules.

The output format will have to be stable during runs, so we can use plain diff to check for differences and comparegraphs this means that the tool will need to sort the elements by something immutable. For example by module name.

In order to enable personaltest the test will be performed as part of build (or commit validation) and the results will be compared to a golden files stored in CVS. If a difference is found the build will fail to warn the developer that he is doing something imporant. But as we need to allowchanges, the location of the golden file will be printed and the developer allowed to modify it as well. This will help with education and improve the understanding that modification of modules dependencies in an important part of API.

For the version that does the dailytest a special property overriding the location of golden files will be given and instead of CVS the test will compare the changes to the result of previous daily build. If the test fails, the report will be sent to api-changes@netbeans.org, so everyone interested is notified about the changed made during the day.

The check for fileslayout is similar but does not need the special task. It can be composed from standard ant tasks that will generate the file with the layout and compare it to the golden file in ide/golden/files-layout.txt.

The simpleincrement goal is desirable, but addressed just indirectly by preventing the number of implementation dependencies to grow. In order to simplify the increment after branching, it is suggested to include in each manifest a special postfix of specification version. So the version would be composed from regular part 1.2 and a special one ${specversion.postfix}. This would be by default empty. After creating a branch we would change nbbuild/build.properties to define the value as specversion.postfix=.1 and this would immediatelly increment all spec versions in all modules on the branch. However the increment in trunk would still have to be done by hand.


Comments to nbui@netbeans.org please.

Project Features

About this Project

openide was started in November 2009, is owned by Antonin Nebuzelsky, and has 53 members.
By use of this website, you agree to the NetBeans Policies and Terms of Use (revision 20160708.bf2ac18). © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo
 
 
Close
loading
Please Confirm
Close