=== VSTa FS: an Extent Based Filsystem === A file is structured as: | File header | Extents | Data... So the stuff UNIX would put in inode is actually just the first N bytes of a file. Each extent is merely <start,len> tuples. The sizes are chosen carefully, so that a 100-byte file occupies a single 512-byte sector, including both file data as well as file ownership, protection, and allocation information. A directory is just a special file, so smaller directories will also occupy a single sector. As a file grows, its existing allocation is extended where possible. So if your file is currently {<3,1>} (i.e., sector three, one sector long), and you write enough data to push out beyond the initial sector, the filesystem will see how many blocks can be had starting at 4. Up to 64K is taken at once, in the hope that taking this much proactively will spare you having to "piecemeal" your file together as you write. Thus, your file could then be {<3,129>}. The file's length is marked to be the entire length of the whole allocation (129*512 = 66048). If you crash, the file thus correctly represents the storage it holds, without fsck being required before filesystem operation is resumed. As you write the file, the filesystem keeps track of the highest offset you have written. On last close, blocks beyond this point are freed, and the file is "shrunk" down to the actual size written. If you wrote a K or so, then closed the file, the example file would end up as {<3,3>}, and 6..129 would return to the free list. The free list uses a sorted linked list of free ranges, with coalescing. Each range is held in a single sector; each sector has a next pointer to another sector's worth of free list ranges. A sector can hold 0 or more entries; if it ever overflows, a global rebalance is done, resulting in each sector being exactly half full (except the last in the list which might hold less). This allows contiguous ranges of free blocks to be easily identified, and is very storage efficient, even with small allocation units (such as sectors). It is also easy to prove that your free list can always represent the total of free blocks, because a block on the free list can be consumed to provide another sector's worth of storage for the free list. Bitmaps are popular for free block management, but their size grows as a function of increasing filesystem size and decreasing allocation unit size. Most implementations either take a larger allocation size--say, 8K--and waste disk space for smaller files, or add tremendous complexity to special-case sub-allocation-size units (i.e., "frags" in the BSD filesystem). VSTa is primarily an experimental platform, so I decided to try something different. The most complex part of the filesystem is the block cache. This is a misnomer; each entry in the buffering system is variably-sized. For an extent of up to 64K, a buffer of the exact size is kept. For larger extents, the extent is buffered in 64K chunks. Between the contiguous block allocation and the variable-size buffer, I hope to provide very efficient filesystem services. Consider a file with allocation {<10,8>} (i.e., a file starting at sector 10 for 8 sectors--4K). When this file is opened and read, a 4K buffer is allocated. A single disk I/O fills the entire 4K. Thus, most small files will have their entire contents buffered in a single disk I/O. Larger files will have their contents buffered in chunks of 64K. Once the file contents are in the filesystem's buffer, it may be parcelled out to clients without further I/O activity. This scheme should work well for text files, objects, and some executables. It will not work well for, say, databases. They would do better to talk directly to a disk partition, or through a similar, but non-buffered version of this filesystem. My initial prototype will also not protect itself well against multiple, competing allocation requests. I hope to incrementally improve this as I learn more from actual use.