FeaturesPluginsDocs & SupportCommunityPartners

MasterFilesystem redesign

$Revision: 1.11 $
Changes: available in CVS
Status: issue 51551 and opinion document

Problem:

There is couple of problems related to MasterFileSystem design with respect to experience with NB4.0.

Problem overview

  1. First of all there is a poor support for resources lying down on remote hosts within a LAN (JDK, libraries,sources etc.)
    • 45963 - masterfs needlessly gets children. Symptoms of this issue are poor performance and responsiveness for accesing network resources. This problem is especially severe on Solaris where automounters seems to be configured by default to browse and fetch all accessible hosts of /net via NFS.   
    • 46813 - accessing files via UNC-Path.  As a result of this issue there isn't possible to address resources via UNC path on Windows (e.g. by entering UNC path into FileChooser).  
  2. High memory consumption occupied by FileObject's internal structures
    • 35414- FileObjects occupy too much memory.
  3. Problems with  symlinks on Unix.
    • The same file accessed through two different paths produces distinct FileObjects and distinct  editor tabs and so on.  The worst possible scenario as a result of this issue  is data loss.

Issues digest

Issue Description/Analyse Notes Workaround
  45963 Remote calls for network resources would affect negatively performance even if NB code was implemented as optimal as possible because responsivness depends on combination of many factors:
network configuration, user's workflow, NB implementation etc. Not all of these factors can be affected by NB developer.
Problem in current implementation is some sort of inefficiency in AbstractFileSystem implementation. Exactly all siblings are created whenever a FileObject is requested for one child of a parent.
This problem could be called API bug because there is no way how to implement AbstractFileSystem and avoid this problem. There exists AbstractFileSystem.List interface that is responsible for providing children but has only one method that returns all children.
MasterFileSystem currently delegates on LocalFileSystem and VCS fileSystem which both are subclasses of AbstractFileSystem. That's why is MasterFileSystem involved. But VCS filesystem will probably never have "mount point" above an autofs map point which is the source of the problem.
This problem isn't platform specific but its impact on users is extraordinary severe on Solaris.
The same problem existed in the 3.x filesystem implementation. The difference is that in 3.x, you would not be likely to create a filesystem whose "mount point" (root) was *above* an autofs map point. E.g. you might have mounted "/net/some.thing/some/path" and all filesystem operations would be inside that. However in 4.0 with the switch to masterfs, it is equivalent to having mounted just "/" (on Unix at least) and then expanding subfolders "net", "some.thing", "some", and "path", before getting to the rest; FileObject's are created for these intermediate folders.
1/ The workaround is to alter /etc/auto_master and use option -nobrowse on Solaris.
2/ Create and use symlinks pointing to existing files or folders inside /net (or elswere according to configuration) that caused these problems. This workaround simulates NB3.6 behaviour.
46813 There is impossible to use UNC path in NB at all, which means that there isn't possible to convert Files that represent UNC path into FileObject. Files representing UNC path has its own specifics that wasn't taken into consideration when MasterFileSystem was designed. Current implementation of MasterFileSystem supposes that if java.io.File claims that exists then the same is true for its parent which isn't true for UNC paths.  
Masterfs is vulnerable to platform-specific oddnesses near disk roots. This problem is especially obvious on non Linux platforms (Windows, OS/2, OpenVMS). There isn't always easy to map Files to FileObjects in MasterFileSystem because all existing Files don't have one common shared root which MasterFileSystem has because its implemented as a singleton. This mapping is currently ensured by providing virtual FileObjects ( e.g. for virtual root on Windows which is above individuall drives C:\, D:\ ...).
There are also some serious bugs in JDK that can't be easily workarounded without impact on performance:
  • #4723726 URI.normalize () ruins URI builf from UNC file
  • #4241259 java.io.File.getParentFile () return non directory
  • #5086147 File,URI,URL conversions are strange for UNC path
Regression against NB3.6 - in 3.6 you could explicitly mount whatever root dir you wanted and it would not play any tricks with that path prefix.
This is Windows specific problem.
Mapping network drives and use them instead of UNC path.
35414 This issue is just about implementation internals. There is requiered lower memory consumption.

   

Goals and scope:

Goal  ID Goal Scope Priority
G1 Improve performance and responsiveness for accesing network resources. Access and refresh only those  children that are/was requested. High
G2 Eradicate the barriers to map UNC File to FileObject in MasterFileSystem code. Preferably workaround JDK bugs and support UNC pathes. At worst be ready to support UNC with minimum effort as soon as #4723726, #4241259, #5086147 are fixed in JDK if workarounds ruin performance or any other problems appear. High
G3 Handle safer symlinks in MasterFileSystem impl. Provide support for preventing simultaneous modification of two versions of the same file. But preferebly XXX what??
High
G4 Reduce memory consumtion The whole redesign will respect requirement for reducing memory consumtion and there will be carefully balanced memory consumption and performance.
Medium
G5 Implementation should reflect changes in requirements that gradually evolved.
MasterFileSystem was originally designed to be general and there was requested to  enable pluging in arbitrary filesystem implementation. Currently MasterFileSystem's API is defined as friednly contract and there is expected that only VCS filesystem will be  pluged in but only temporarily and VCS support  is assumed to be changed  radically by  geting rid of filsystem based implementation at all. MasterFileSystem should reflect these changes.
Take into account shape of things to come and design with respect to it. Medium 

Design view:

G1

There is a problem that AbstractFileSystem API isn't sufficient to satisfy this goal. Then there is possible:
  1. change AbstractFileSystem  API (AbstractFileSystem.ExList with method boolean existChild(Stringname)) and adjust code in AbstractFolder/AbstractFileObject to use it
  2. avoid pluging LocalFileSystem into MasterFileSystem at all
If we avoid pluging LocalFileSystem into MasterFileSystem we may:
  1. implement new filesystem from scratch and plug it into MasterFS instead of LocalFileSystem
  2. change MasterFileSystem impl. to delegate directly to java.io.File if there isn't appropriate VCS delegate
1. Then both LocalFileSystem and VCSFileSystem must implement this new interface. There is disadvantage that also many significant changes must be done (rethink and check) in filesystems package(AbstractFolder, AbstractFileObject) because e.g. events are fired only when children were requested, the same is true for refreshFolder and so on. This is risky because it affects fixes must go into relatively fragile parts of code. 
 2. There is naturally also possible to combine 1. and 2. because adding new ExList interface doesn't mean that MasterFileSystem must use fixed LocalFileSystem.
3.  Filesystem might be implemented either in filesystems package in openide or directly in  masterfs module.  Implementation in openide means API change but on the other hand such implementation had available some handy classes that are package private and that make implementation easier (StreamPool, ListenerList).  Implementation in masterfs module doesn't require any API change and isolates all changes into masterfs module - then fix would be placed completely in MasterFS and would be applicable to the Promo D / NB 4.0 code line.
4.  This means radically change implementation of MasterFileSystem and delegate in turn  on VCS FileObject and on java.io.File.  This make the implementation more complicated because nature of VCS FileObject and java.io.File is different (events, synchronization). 

G2

There is problem in MasterFileSystem code that there must be introduced some virtual FileObjects as a glue to to enforce a completely artificial single tree model. Virtual FileObjects were already adressed as evil with respect to VCS filesystem (and the experience proves it - there were many bugs related to virtual FileObjects) . MasterFileSystem, which is currently singleton, could be easily reimplemented and there could exists couple of instances alive at the same time. For example: c:\foo.txt and d:\foo.txt and \\remote\share\foo.txt doesn't need to be in the same FileSystem.  This makes masterfs less vulnerable to platform-specific oddnesses near disk roots and simplifies implementation. 
But this splitting  should be followed by analyse why the API users use FileSystem instances and if current API is sufficient because the only way how to get FileSystem instance is FileUtil.toFile (File).getFileSystem (). Moreover this code is copy/pasted many times on different places in different modules.
By now there was just one instance, so whatever the API user intended to do there was no problem because such instance definitely included all FileObjects. So, there is question whether there should be provided Repository alike list of  FS instances or rather provide some specialized methods like refreshLocalHost that would refresh all alive instances. These mentioned API changes should go into openide codebase but implementation of  MasterFileSystem is coupled with them.

G3

Symlinks generally means that one file can have more parents which doesn't correspond with current filesystem API. Then can't be ensured 1File to 1 FileObject mapping. Probably the best we can do is prevent simultaneous modification of two versions of the same file.
All non canonical  FileObjects could delegate *some*  method calls (at least some methods must be implemented in different way: getParent (), getPath () and so on) on canonical FileObject. Then all these FileObjects could share e.g. one FileLock. This solution could also ensure that  FileEvents would be fired from all relevant FileObject instances.  There could be easily feasible to refresh one instance and update children caches for all of them. The next problem that could be solved by this approach is that there could be  prevented from writing into output stream via one FileObject instance and simultaneously read its incomplete content via another instance.

G4

There are a few approaches how to decrease memory consumption but all must be well analysed, measured and balanced for every individual usage:
  • final boolean variables can be replaced by individual classes (e.g. one for folders and one for files)
  • if some variable is used only rarely only on some instances then there is possible to keep this variable outside e.g.: in static weak map <FileObject, variable>
  • path is the place where can be gained enough because currently there is kept the whole path for every FileObject. There could be possible to compute the path and combinate it with caches, reflect the fact that there is less count of folders than files and so on

G5

MasterFileSystem in current form won't be probably necessary as soon as VCS filesystem disappear. Then masterfs module could provide only URLMapper implementation and some implementation of local filesystem. There won't be necessary any arbiter that would ensure switching delegates which is MasterFileSystem's current role.  I think that from this perspective is better minimize effort spend on MasterFileSystem class and rather concentrate on impl. of local filesystem that would be easily pluged into MasterFileSystem as it is now (see G1).  Naturally MasterFileSystem can't be completely ommited during redesign because decrease memory consumption, transition from singleton concept can't be implemented nowhere but in MasterFileSystem class. But safer symlink support could be already implemented in local filesystem (see G3).    

Implementation - current status:

The whole  openide was branched and the branch is called masterfs51551.  There was implemented new filesystem for local files independent from AbstractFileSystem that is called FileBasedFileSystem.  This filesystem is pluged into MasterFileSystem instead of original LocalFileSystem.   Implementation is covered by  tests written originally for MasterFileSystem, but there were provided new special tests for  FileBasedFileSystem that will be polished and commited into mentioned branch. Stability according all tests looks satisfactory (about 95%)in this phase.  These tests were runned on following platforms: Solaris, Linux, WindowsXP, Windows2K.  On all these platforms the results were identical so there doesn't seem to be any known platform specific problem. Additionally there were 100% succesful functional tests: IDE commit validation and  IDE validation on all  platforms.  No performance meassurement has been done yet (performance team is going to be be involved and they promissed to be helpful by testing).  On MacOS there wasn't no serious testing but at least there was possible to start NB IDE , open a few projects, build and clean.
This filesystem was designed  and implemented with TCA's in mind. Just a few notes:
  • File.listFiles is called only in case that method FileObject.getChildern is invoked or FileObject.refresh is called 
  • FileLocks implementation creates new temporary locking file that disappeares after lock is released. Presence of this locking file isn't enough for considering file as locked (this is neceassry  to be able to recover from crash) and there are two other conditions that must be satisfied.  First there is also requested nio.FileLock for that locking file. This prevents from being able to edit the same file in two running NB IDEs.  All issued locks are kept in some Collection until locks are valid. So, the second condition is that lock mustn't be found in that Collection.
  • UNC support is currently implemented in FileBasedFileSystem but there is necessary to adjust MasterFileSystem to take advantage of it (this hasn't been  done yet). There is also important mention that  JDK bug #5086147 can't be fully workarounded in  masterfs module but must be also reimplemented in FileUtil class.  Then all  File  to  URL conversions (and vice versa)  should go through  FileUtil  methods.
Implementation is mostly encapsulated in masterfs module with a few exceptions.:
  • mentioned workaround of #5086147 in FileUtil (just implementation changes)
  • tiny API change - FileObject .getFileObject was made not final
  • org.openide.actions.FileSystemAction must be partly modified to provide lookup containing set of FileObjects when invoking createContextAwareInstance
Minority opinions wasn't taken into account.
FileBasedFileSystem a few other implementation details:
  • FileBasedFileSystem is java.io.File centric and was implemented to be less vulnerable to  platform-specific oddnesses near disk roots. There is no platform specific code except UNC workarounds
  • Implemantaion of FileObject was splitted into two classes. One for directories and one for plane files which helped to keep just those fields that are really necessary
  • File pathes are kept as a linked list of name fragments  So, file path is basically set of name fragments called FileNames that are linked.  Each FileName should be kept in memory just once. The whole path must be computed but there is special implementation for folders called FolderName which caches once computed path. There is one special implementation for UNC called UNCName
  • childrens are cached to be able to keep status and fire events if there is found any change by refresh. This implementation of childrens is able to work with incomplete set of children
Comments to nbdev@netbeans.org please.
Companion
Projects:
MySQL Database Server   Open JDK: an Open SourceJDK   GlassFish Community: an Open Source Application Server    Mobile & Embedded Community    Open Solaris   java.net - The Source for Java Technology Collaboration   Open ESB - The Open Enterprise Service Bus Powered by