New inode, dcache and transname implementation for Linux.

This patch is based on 2.0.27; it may work with later versions, but I have
not tried. It is in alpha state - do not use with valuable data.

Features:

   - old and new code for inode.c and dcache.c exist in parallel and can
     be exploited by enabling the new config options you will find in
     section "filesystems".

   - the new dcache can hold "negative entries", i.e. names that are known
     to *not* exist. This saves unneccessary lookups for non-existing names
     and is in particular useful for the transname facility.

   - /proc/<pid>/fd/ now contains symlinks with the full absolute path
     to the inode. So you can see where the inode is.

   - deleted files are kept in a two-level basket: if a used inode is
     unlinked, the name foo is moved to .deleted-<number>.foo and is kept
     in the dcache (not on disk!), so it remains accessible for
     /proc/<pid>/fd/ (and for normal lookup of course).
     Try the shell script in the appendix to see what it does!

   - if the fs supports it (currently only ext2), deleted files go to
     the second level basket as soon as the i_count becomes zero. The
     second level basket keeps up to a constant number (currently 100)
     of deleted files, but only as long as no umount() is done (e.g.
     until the next reboot) and as long as sufficient space remains
     on the disk. If the second level basket needs to be reduced
     (e.g. because of space shortage), the files are freed in LRU order.

Bugs:
  
   - MP-safety not yet fully implemented (missing vfs_lock()s etc.)
   - omirr-code not tested
   - lots of debug code / debug messages that will finally disappear....
   - please report any other bugs.

Things that I do not want to implement, but rather should be done by the
maintainers of the corresponding packages / kernel parts:

  - arch/{sparc,alpha} contains readdir implementations that I did not
    update, since I have no such machines. Either make the same changes
    as I did in fs/readdir.c, or perhaps we should think of making less
    code arch dependent (I suspect only the dirent format is different
    for different architectures, but the algorithm can remain the same).

  - filesystems that want to support the second level basket must use a
    callback to free_ibasket() if they get short of space. I just implemented
    a provisionary for ext2 that is not quite correct (it calls
    free_ibasket() if a cylinder group gets short of space), just to
    demonstrate the effects. This should be fixed/implemented by the
    maintainers of the respective filesystems.

  - rm -rf some_directory does multiple scans on the directory. The problem
    is that the first rm pass will move all files to their .deleted-0001.*
    form, so that the directory *appears* as not being empty. However,
    the dir on the *filesystem* *is* empty, it is only the readdir() which
    simulates the existance of the basket entries. rm -rf then sweeps over
    the dir a second time, thus clearing all recursive entries forever.
    This should be fixed to only clear non-.deleted-* entries, such that
    the whole directory tree will stay intact in its .deleted-* form.

Things that I want to implement in the near future:

  - simplify the code (less state information) if someone can tell me
    good ideas for this.

  - move the whole thing to the 2.1.* kernel series (perhaps to be done
    by David Miller)

  - introduce an interface for VFS extensions, in a most general form.
    Please report ideas to me if you have some.

  - read/write locks for dirs (and perhaps other kernel structures):
    Untested prototype already implemented.

Other extension proposals, perhaps to be done by others:

  - change i_count policy globally, such that most is done in the VFS
    and fs'es no longer have to maintain it. For example, <fs>_lookup()
    and others that get an inode as parameter should not touch the i_count
    any more.

  - Move some more tests (e.g. in <fs>_unlink()) from the fs'es to
    the VFS, to make it more generic and redundancy-free.

  - Use the rw_lock at VFS level to control concurrency on directory
    inodes; the fs'es can be simplified *much* after this (I'm thinking
    of all that retry/versioning stuff in ext2 etc.) At least, if people
    fear to loose some concurrency (because the current behaviour is
    in essence an optimistic strategy), introduce a flag
    FS_CONTROL_CONCURRENCY (or similar) whether the particular fs
    is responsible for locking or not.

  - Another alternative would be to do the optimistic strategy at VFS level,
    at least in those places where performance is essential, such as
    concurrency in readdir(). The VFS could maintain all those version
    counters, and re-call the fs routines in case of update conflicts.
    Please tell me your ideas on that.

  - Some caching functionality, currently spread over many parts
    of the kernel, could be made more generic.

  - Centralize all kernel debugging options in one Config.in, so developers
    have less effort if their changes effect other parts of the kernel.

Send bugs, comments, flames, fixes etc to
schoebel@informatik.uni-stuttgart.de

Greetings,

-- Thomas

-----------------------------------------------------------------------------
#!/bin/sh

i=3
while [ $i -gt 0 ]
do
  echo "Hello #$i" > file
  sleep 20 < file &
  rm -f file
  ls -l /proc/$!/fd/0
  cat /proc/$!/fd/0
  i=`expr $i - 1`
done
echo After:
ls -a