Thursday, January 16, 2014

Let's at least start to consider killing the plain old file system

Electronic files have been an integral part of how we use computers for a very long time. I think it is time to rethink how software developers use files and the file system. A single file always has a single state on the disk. When the user chooses to ‘save’ the file, the old state is gone forever. There is no way to go back and look at any old state of a file (without the help from some tools). Electronic files made sense in the dawn of computing. Disk space was expensive and we could not afford to store every single change on our small, expensive disk drives.

Version control systems are an acknowledgment that the ‘plain old file system’ does not work for programmers. The first ones were built decades ago when disk space was still expensive and as a result they are full of compromises. They are optimized to be disk efficient. What is very surprising to me is that modern version control systems are built with the same constraints. Git and mercurial were created within the last ten years but they are designed with 1980's disks in mind.

My big complaint about the file system is that, even though disk space has gotten incredibly cheap, we don't store more information about the changes to our files. Since the cost per bit of disk space is ridiculously cheap (and getting cheaper) we should record all the changes to a set of files and make that data available to be manipulated.

Complex electronic creative artifacts like code, novels, and electronic art evolve over time. Using the file system alone makes it impossible to capture that evolution. Using version control tools help capture some of the history but they require that the user actively think about using them. Good luck getting an author or artist to use git, the CLI is terrible and only a software developer would put up with it. In addition, animating multiple consecutive changes in a version control system to show long term evolution is difficult. It is going to be a challenge for future scholars to see how modern authors work if the authors are not vigilant about storing intermediate versions of it. Compare this with hand written manuscripts from one hundred years ago with edits in line.

You might be thinking, “it is good that some of that file history is lost forever, it represents work that turned out to be not very good” but I argue that there are some lessons that can be learned from those experiences. It is very difficult to see how others do creative work when it is done on a computer. Since we can't see how people work, we can't learn from their experiences. We need to open up those experiences so others can learn.

If we stored all of the changes all of the time and we stored some additional data like who made each change, when was it made, and where the author was when they made the change, we would start to open up some learning opportunities. Google docs and other cloud based tools are recording more and more of this kind of historical information. They are not worried about disk space.

The project I am working on, Storyteller, seamlessly records all file based interactions for software developers by extending their IDEs. My hypothesis is that the most valuable metadata that can be stored is an author supplied commentary on a set of changes. This commentary will follow the animated changes and offer an explanation, a hint, or a lesson learned from the author. These can be shared between developers to open up the programming process so that we can learn from each other.

The real issue is that there are some files that are meant to be consumed left to right, top to bottom but are rarely created that way. The compiler reads source code files in this way but we all know that code is never written that way. The process of creating those artifacts is often lost because of our use of electronic files and supporting tools. I can envision a new abstraction over the file system that does this for all files like this. Perhaps Storyteller will be the answer for software developers, or maybe it won't. Regardless, I believe we need to do something to move the electronic file into the 21st century.

2 comments: