MasterFilesystem redesign
$Revision: 1.2 $Changes: available in CVS
Status: issue 51551 and opinion document
Problem:
There is couple of problems related to MasterFileSystem design with respect to experience with NB4.0.Problem overview
- First of all there is a poor support for resources
lying down on
remote hosts within a LAN (JDK, libraries,sources etc.)
- 45963
- masterfs needlessly gets children. Symptoms of this issue are poor
performance and responsiveness for accesing
network resources. This problem is especially severe on Solaris where
automounters seems to be configured by default to browse and fetch all
accessible hosts of /net via NFS.
- 46813 - accessing files via UNC-Path. As a result of this issue there isn't possible to address resources via UNC path on Windows (e.g. by entering UNC path into FileChooser).
- 45963
- masterfs needlessly gets children. Symptoms of this issue are poor
performance and responsiveness for accesing
network resources. This problem is especially severe on Solaris where
automounters seems to be configured by default to browse and fetch all
accessible hosts of /net via NFS.
- High memory consumption occupied by FileObject's internal
structures
- 35414-
FileObjects occupy too much memory.
- 35414-
FileObjects occupy too much memory.
- Problems with symlinks on Unix.
- The same file accessed through two different paths produces distinct FileObjects and distinct editor tabs and so on. The worst possible scenario as a result of this issue is data loss.
Issues digest
|
Goals and scope:
|
Provide support for preventing simultaneous modification of two versions of the same file. But preferebly |
High
Medium
MasterFileSystem was originally designed to be general and there was requested to enable pluging in arbitrary filesystem implementation. Currently MasterFileSystem's API is defined as friednly contract and there is expected that only VCS filesystem will be pluged in but only temporarily and VCS support is assumed to be changed radically by geting rid of filsystem based implementation at all. MasterFileSystem should reflect these changes.
Design view:
G1
There is a problem that AbstractFileSystem API isn't sufficient to satisfy this goal. Then there is possible:- change AbstractFileSystem API (AbstractFileSystem.ExList with method boolean existChild(Stringname)) and adjust code in AbstractFolder/AbstractFileObject to use it
- avoid pluging LocalFileSystem into MasterFileSystem at all
- implement new filesystem from scratch and plug it into MasterFS instead of LocalFileSystem
- change MasterFileSystem impl. to delegate directly to java.io.File if there isn't appropriate VCS delegate
2. There is naturally also possible to combine 1. and 2. because adding new ExList interface doesn't mean that MasterFileSystem must use fixed LocalFileSystem.
3. Filesystem might be implemented either in filesystems package in openide or directly in masterfs module. Implementation in openide means API change but on the other hand such implementation had available some handy classes that are package private and that make implementation easier (StreamPool, ListenerList). Implementation in masterfs module doesn't require any API change and isolates all changes into masterfs module - then fix would be placed completely in MasterFS and would be applicable to the Promo D / NB 4.0 code line.
4. This means radically change implementation of MasterFileSystem and delegate in turn on VCS FileObject and on java.io.File. This make the implementation more complicated because nature of VCS FileObject and java.io.File is different (events, synchronization).
G2
There is problem in MasterFileSystem code that there must be introduced some virtual FileObjects as a glue to to enforce a completely artificial single tree model. Virtual FileObjects were already adressed as evil with respect to VCS filesystem (and the experience proves it - there were many bugs related to virtual FileObjects) . MasterFileSystem, which is currently singleton, could be easily reimplemented and there could exists couple of instances alive at the same time. For example: c:\foo.txt and d:\foo.txt and \\remote\share\foo.txt doesn't need to be in the same FileSystem. This makes masterfs less vulnerable to platform-specific oddnesses near disk roots and simplifies implementation.But this splitting should be followed by analyse why the API users use FileSystem instances and if current API is sufficient because the only way how to get FileSystem instance is FileUtil.toFile (File).getFileSystem (). Moreover this code is copy/pasted many times on different places in different modules.
By now there was just one instance, so whatever the API user intended to do there was no problem because such instance definitely included all FileObjects. So, there is question whether there should be provided Repository alike list of FS instances or rather provide some specialized methods like refreshLocalHost that would refresh all alive instances. These mentioned API changes should go into openide codebase but implementation of MasterFileSystem is coupled with them.
G3
Symlinks generally means that one file can have more parents which doesn't correspond with current filesystem API. Then can't be ensured 1File to 1 FileObject mapping. Probably the best we can do is prevent simultaneous modification of two versions of the same file.All non canonical FileObjects could delegate *some* method calls (at least some methods must be implemented in different way: getParent (), getPath () and so on) on canonical FileObject. Then all these FileObjects could share e.g. one FileLock. This solution could also ensure that FileEvents would be fired from all relevant FileObject instances. There could be easily feasible to refresh one instance and update children caches for all of them. The next problem that could be solved by this approach is that there could be prevented from writing into output stream via one FileObject instance and simultaneously read its incomplete content via another instance.
G4
There are a few approaches how to decrease memory consumption but all must be well analysed, measured and balanced for every individual usage:- final boolean variables can be replaced by individual classes (e.g. one for folders and one for files)
- if some variable is used only rarely only on some instances then there is possible to keep this variable outside e.g.: in static weak map <FileObject, variable>
- path is the place where can be gained enough because currently there is kept the whole path for every FileObject. There could be possible to compute the path and combinate it with caches, reflect the fact that there is less count of folders than files and so on
G5
MasterFileSystem in current form won't be probably necessary as soon as VCS filesystem disappear. Then masterfs module could provide only URLMapper implementation and some implementation of local filesystem. There won't be necessary any arbiter that would ensure switching delegates which is MasterFileSystem's current role. I think that from this perspective is better minimize effort spend on MasterFileSystem class and rather concentrate on impl. of local filesystem that would be easily pluged into MasterFileSystem as it is now (see G1). Naturally MasterFileSystem can't be completely ommited during redesign because decrease memory consumption, transition from singleton concept can't be implemented nowhere but in MasterFileSystem class. But safer symlink support could be already implemented in local filesystem (see G3).Implementation - current status:
The whole openide was branched and the branch is called masterfs51551. There was implemented new filesystem for local files independent from AbstractFileSystem that is called FileBasedFileSystem. This filesystem is pluged into MasterFileSystem instead of original LocalFileSystem. Implementation is covered by tests written originally for MasterFileSystem, but there were provided new special tests for FileBasedFileSystem that will be polished and commited into mentioned branch. Stability according all tests looks satisfactory (about 95%)in this phase. These tests were runned on following platforms: Solaris, Linux, WindowsXP, Windows2K. On all these platforms the results were identical so there doesn't seem to be any known platform specific problem. Additionally there were 100% succesful functional tests: IDE commit validation and IDE validation on all platforms. No performance meassurement has been done yet (performance team is going to be be involved and they promissed to be helpful by testing). On MacOS there wasn't no serious testing but at least there was possible to start NB IDE , open a few projects, build and clean.This filesystem was designed and implemented with TCA's in mind. Just a few notes:
- File.listFiles is called only in case that method FileObject.getChildern is invoked or FileObject.refresh is called
- FileLocks implementation creates new temporary locking file that disappeares after lock is released. Presence of this locking file isn't enough for considering file as locked (this is neceassry to be able to recover from crash) and there are two other conditions that must be satisfied. First there is also requested nio.FileLock for that locking file. This prevents from being able to edit the same file in two running NB IDEs. All issued locks are kept in some Collection until locks are valid. So, the second condition is that lock mustn't be found in that Collection.
- UNC support is currently implemented in FileBasedFileSystem but there is necessary to adjust MasterFileSystem to take advantage of it (this hasn't been done yet). There is also important mention that JDK bug #5086147 can't be fully workarounded in masterfs module but must be also reimplemented in FileUtil class. Then all File to URL conversions (and vice versa) should go through FileUtil methods.
- mentioned workaround of #5086147 in FileUtil (just implementation changes)
- tiny API change - FileObject .getFileObject was made not final
- org.openide.actions.FileSystemAction must be partly modified to provide lookup containing set of FileObjects when invoking createContextAwareInstance
FileBasedFileSystem a few other implementation details:
- FileBasedFileSystem is java.io.File centric and was implemented to be less vulnerable to platform-specific oddnesses near disk roots. There is no platform specific code except UNC workarounds
- Implemantaion of FileObject was splitted into two classes. One for directories and one for plane files which helped to keep just those fields that are really necessary
- File pathes are kept as a linked list of name fragments So, file path is basically set of name fragments called FileNames that are linked. Each FileName should be kept in memory just once. The whole path must be computed but there is special implementation for folders called FolderName which caches once computed path. There is one special implementation for UNC called UNCName
- childrens are cached to be able to keep status and fire events if there is found any change by refresh. This implementation of childrens is able to work with incomplete set of children