« April 2004 | Main | July 2004 »

File folders: The carbon database filesystem

Walls of file cabinets can be found at any institution, containing thousands upon thousands of manually indexed documents. When someone's interested in using a folder within, they take it out, bring it to their desk, and work with it. Once done, it's put in a "to be filed" stack (or filed immediately). This is the most common way to structure a filesystem on today's computer as well: cabinets of folders of files, all carefully packed away on the wall of the office, waiting to be carefully taken out, worked with, and then carefully put back.

The translation of a carbon-based filing system to a silicon-based filesystem leaves out one key component: most people don't have someone to file their "to be filed" stack. The clearest sign of this lack is a directory filled with thousands of files, distinguishable by filename; the silicon brings is the ability to manage a stack of thousands of papers with very little effort, and all that's asked of the user is to choose a unique title.

Over time, some of those who initially have a folder with thousands of files will begin to create folders, for things that need to be lifted up out of the mess (urgent bills, closed cases, etc.). Given several years, many bookmark menus filled with links will end up carefully organized and sorted; people who start off putting every file in the "to be filed" folder make up simple ontologies ("Work", "Bills", "Family") and begin to refile their documents.

The analogy of a filing system to a database filesystem is a tough one: filing systems are generally subject to physical limits (you can't have two million pages in a single folder), and there's all sorts of features that don't exist outside of silicon. It's still an effective explanation for conveying what precisely this new "database filesystem" feature of the next OS upgrade is, though -- and many keep their documents (paper and digital) in a filing system and put documents in the "to be filed" folder when they're done.

Silicon brings a second advantage to the table: now people can work with tremendously large collections of objects with very little effort. Searching fifteen thousand songs on my laptop takes approximately one second; searching four billion web pages on Google takes approximately one second. This is where database filesystems can shine, and where the most confusion will lie. It only takes a few seconds to change the filing system; instead of hiring extra interns and spending a week reorganizing filing cabinets, the silicon shifts things around immediately.

In many commonly used filesystems, each folder is given a database of files and each file is given an assigned "name". Files may have other properties, but with rare exception these are not used to uniquely identify files; a file's "extension" is considered part of the "name". NTFS stands apart by bringing a second unique identifier to the filesystem (a two-column primary key, in database terms), but it's not commonly used or recognized by most.

As the filesystem becomes a collection of documents with a convenient selection of perspectives, the filing system metaphor becomes somewhat strained. It's not considered efficient to reorganize a collection of files every five minutes when it takes tremendous amounts of manpower and logistics, yet it goes unnoticed on computers everywhere, hundreds of times a day. A stronger analogy is necessary, to provide an easy path for harnessing the new possibilities.

Astronomers work with a collection of millions of objects every day, using different perspectives such as "color", "brightness", "position", or even "name". By aiming their telescope to a given perspective, they can precisely locate a star; if their calculations (or assumptions) are incorrect, then further work is required. Eventually they get it within the viewfinder, work with it for a while, and then move on to the next perspective.

Bridging the analogies, astronomers work with a single file folder filled with all the objects they have (the "universe"); then by sorting through different perspectives (such as "name") they find what they seek and work with it. Imagine a planetarium with all your documents broadcast on the ceiling in small print, and you need only a pair of binoculars and a direction to look to find anything in your collection. Unlike a filing system, astronomers have no need to re-file things when they're done, since the only thing that changed was their perspective.

A database filesystem, built properly, can allow the user to accrue every document in a single place, with the power to search through the collection efficiently. A document's "name" need not be unique, as long as the files with a given "name" are linked in some manner (say, revisions of a contract). With the ability to search through all the documents at once, filenames to some extent become moot; it's more effective for many to search for "Jan's resume" than to scroll through thousands of files sorted into directories (as evidenced by the recent popularity of Google, vs. Yahoo!). This is where the true power of a database filesystem lies.

My Photo

Recent Posts

Powered by TypePad




  • Antispam
  • Cloudmark
  • Shadows
  • Styles
  • You were here
  • floating atoll


  • Search


  • Ads