Friday, 3 February 2012

Mercurial: Dropbox for geeks

I'll admit it, Dropbox is pretty cool. It's great for quickly sharing photos and docs with others (including collaborators), and for a while I was using it to keep much of my working directory in sync across computers. What's especially cool is the rat pack add-on, which keeps track of all your file changes so you can revert to an old version if you do something stupid (and boy, can I do some stupid things).

However, last year there were a few privacy issues raised with Dropbox. The first is that there is no client-side encryption. That means that although your data is encrypted on their servers, this can be decrypted by them also. Their policy is not to do that, but they *can* be forced to "when legally required to do so". Then there are fond memories of the time that every single account was left accessible without a password for a good few hours. Ouch!

Even though the work I've been sharing on Dropbox isn't super-sensitive (just working papers, data analyses and modelling), this isn't really reassuring. One solution is to use services like SecretSync. However, this requires forking over money for a pretty basic service that Dropbox should be providing by default. So instead I've turned to a geeky option called Mercurial. I've actually been using this for a few years to keep stuff in sync with Steve Lewandowsky when we were writing our masterpiece, and it works really well.

Mercurial is technically known as a distributed version control system. We start off with a seed repository, which contains the initial state of all our files (if any), and initialize it. Then every user of the repository can clone that repository to their own computer so they have local copies of the files. One important thing is that the files in our directory are kept separate from the repository itself. Accordingly, working practice is to edit as per usual, and then every now and then to "commit" those changes to the repository (e.g., at the end of the day, or when you've reached a milestone). However, even after committing those changes, they'll still just be local to your computer. Accordingly, one needs to push their changes to the central repository (which is usually hosted on a server, and talked to via ssh or the like) and pull changes that other users might have made.

One nice thing is the merge feature. If I've made changes to a file, and you've made changes to the same file in the interim, then mercurial can automatically merge those changes in an intelligent fashion. In cases where changes can't be merged (i.e., we modified the same line in the file), we can use a program like kdiff3 to manually resolve the conflicts; in my experience this happens pretty rarely. Another cool thing is that we also have a record of all changes we've made to all files in the repository since the intialization. This has been a lifesaver in cases where, for example, some model simulations have been working really well, but then in overexuberant exploration I've made some silly edits and lost the working version of the model (yes, I can be that silly sometimes).

By default, committing, pulling and pushing all need to be carried out manually. However, I've been using an extension called autosync that will automatically go through a commit/pull/merge/push cycle regularly. Mine fires off every 10 minutes, so my computers are more or less in constant sync. This avoids one problem with Dropbox, which is that it eagerly syncs across all the crappy auxiliary files that are generated when compiling LaTeX documents. I've got my Mac set up to delete all the auxiliary files after completion, so this mostly prevents this happening. Deleting those auxiliary files does mean each compilation takes longer, but hey, compiling is just really fast these days.

So Dropbox, you've a great service, but I think we should just keep it casual.

2 comments:

  1. I agree! Dropbox is useful for general file sharing but, like you say, without source/version control features it's not the best tool for collaborative projects. Out of curiosity, is there any reason that you've opted for Mercurial over Git and Git Hub? I was put off with Mercurial when I was told that it doesn't allow you to work on a branch locally (i.e. without having to commit/pull/merge/push) or cherry-pick the changes during a merge.

    ReplyDelete
    Replies
    1. I think we fairly arbitrarily plumped for Mercurial when working on our modelling textbook. Github seems oriented to open collaborative projects, whereas a plus about Mercurial and Git is that you can run the server on your own PC--this is mostly relevant for confidential information that might be lying around and shouldn't be available externally.

      Mercurial is pretty flexible, and you can certainly work on your own without having to go through a push/pull cycle. It also allows you to create branches on your own PC; I've rarely used this myself, but it is useful when trying out things that might break some wonderful pristine code. Not sure what you mean by cherry-picking the changes; when merging you do need to process all conflicts for the merging to complete, but there is the facility to do 3-way merges using a tool like kdiff3.

      BTW, if you are a Git fan it seems that Sparkleshare is pretty close to being a Dropbox replacement. Not tried it, and it doesn't seem to be updated often, but may be an alternative solution.

      Delete