Mind posterity, author! Use a distributed version control system!

Most authors are aware of the necessity to backup their work from time to time. They burn it on cd or store it on an external backup device, but only very few use a version control system to keep track of their changes and have a backup at hand in case of hardware failure, bluescreens or human errors. But there is something better, a dsitributed version control system.


Jan Ulrich Hasecke

Mind posterity, author! Don’t you daydream that some day legions of scientists analyze your writings to find out, when you get for the first time the idea to make a tiny alien called Glimpsy the hero of your novels? Sure you do. And you can make research easier for them if you use a version control system.

WTF is a version control system?

A version control system is something to keep track of changes in code. Developer use a VCS when they write their programs. If you ever wondered, what these guys are doing, when they hack onto their keyboards, here I’ll show you: they type in plain text like this:

def helloworld():
    print "Hello World!"

You see that this is nearly the same, what you do the whole day long, with the exception, that you use a more verbose style to express your thoughts:

Alice opened the door and saw into the blue eyes of something green and slimy.
"Hello World" said the strange thing.

When programmers write code they »check it in« from time to time to have a copy of their work from different states of the project. Imagine the snippet of code above is part of the file glimpsy.py and was written today. Now after a hard day of coding the programmer checks it in, eats a tuna pizza, drinks some beers and go to sleep. On the next afternoon he rises again and changes the file like this:

def helloworld():
    print "Hello World!"
def goodbyeworld():
    print "Good bye, fair World, I have to leave"

And in the early morning after a hard nights work, when he craves for a tuna pizza and some beer, he wants to know what he accomplished and runs a diff on the file. A “diff” is a list of changes between the actual version of the file and the last version, which was checked in. A diff looks like this:

=== modified file 'glimpsy.py'
--- glimpsy.py    2011-01-10 03:00:03 +0000
+++ glimpsy.py    2011-01-11 02:33:59 +0000
@@ -1,2 +1,5 @@
 def helloworld():
 print "Hello World!"
+def goodbyeworld():
+    print "Good bye, fair World, I have to leave"

So biting into the tuna pizza the programmer enjoys the feeling of reading white on black that he was very productive. All lines with a preceding “+” were added after the last check-in. So he takes a good gulp of beer and  checks in the file again. By doing this he adds a new version of the file in his version control system. Perhaps you have done something similar, when you wrote your last big novel by saving it under different names from time to time:

  • mygreatnovel-20101001.doc
  • mygreatnovel-20101025.doc

You have a copy of your great novel from October 01 and from October 25, 2010. If you die and get famous, scientists can compare both versions and will see that sometime between these dates, you invented the name “Glimpsy” for the slimy tiny alien.

With a VCS you get a lot more. For example you can comment on your changes and afterwards read a log with all changes:

revno: 2
committer: Jan Ulrich Hasecke
branch nick: mygreatnovel
timestamp: Tue 2011-01-11 12:42:49 +0100
 my cat always glimpses round the corner when I start to type,
 so I called the alien Glimpsy.
revno: 1
committer: Jan Ulrich Hasecke
branch nick: mygreatnovel
timestamp: Mon 2011-01-10 20:00:03 +0100
 first version of my great novel

There is a message for every revision, where you can say something about your motives to call the alien Glimpsy and scientists can draw their conclusions. Of course you can lie in your logs.

There is a lot more you can do with a version control system, but for now this may suffice.

If you want to set up a VCS, read on here.