Friday, 3 February 2012
Computational modeling summer school
Hey, you should totally check out the 2012 SNF Summer School in Computational Modeling of Cognition!
Mercurial: Dropbox for geeks
I'll admit it, Dropbox is pretty cool. It's great for quickly sharing photos and docs with others (including collaborators), and for a while I was using it to keep much of my working directory in sync across computers. What's especially cool is the rat pack add-on, which keeps track of all your file changes so you can revert to an old version if you do something stupid (and boy, can I do some stupid things).
However, last year there were a few privacy issues raised with Dropbox. The first is that there is no client-side encryption. That means that although your data is encrypted on their servers, this can be decrypted by them also. Their policy is not to do that, but they *can* be forced to "when legally required to do so". Then there are fond memories of the time that every single account was left accessible without a password for a good few hours. Ouch!
Even though the work I've been sharing on Dropbox isn't super-sensitive (just working papers, data analyses and modelling), this isn't really reassuring. One solution is to use services like SecretSync. However, this requires forking over money for a pretty basic service that Dropbox should be providing by default. So instead I've turned to a geeky option called Mercurial. I've actually been using this for a few years to keep stuff in sync with Steve Lewandowsky when we were writing our masterpiece, and it works really well.
Mercurial is technically known as a distributed version control system. We start off with a seed repository, which contains the initial state of all our files (if any), and initialize it. Then every user of the repository can clone that repository to their own computer so they have local copies of the files. One important thing is that the files in our directory are kept separate from the repository itself. Accordingly, working practice is to edit as per usual, and then every now and then to "commit" those changes to the repository (e.g., at the end of the day, or when you've reached a milestone). However, even after committing those changes, they'll still just be local to your computer. Accordingly, one needs to push their changes to the central repository (which is usually hosted on a server, and talked to via ssh or the like) and pull changes that other users might have made.
One nice thing is the merge feature. If I've made changes to a file, and you've made changes to the same file in the interim, then mercurial can automatically merge those changes in an intelligent fashion. In cases where changes can't be merged (i.e., we modified the same line in the file), we can use a program like kdiff3 to manually resolve the conflicts; in my experience this happens pretty rarely. Another cool thing is that we also have a record of all changes we've made to all files in the repository since the intialization. This has been a lifesaver in cases where, for example, some model simulations have been working really well, but then in overexuberant exploration I've made some silly edits and lost the working version of the model (yes, I can be that silly sometimes).
By default, committing, pulling and pushing all need to be carried out manually. However, I've been using an extension called autosync that will automatically go through a commit/pull/merge/push cycle regularly. Mine fires off every 10 minutes, so my computers are more or less in constant sync. This avoids one problem with Dropbox, which is that it eagerly syncs across all the crappy auxiliary files that are generated when compiling LaTeX documents. I've got my Mac set up to delete all the auxiliary files after completion, so this mostly prevents this happening. Deleting those auxiliary files does mean each compilation takes longer, but hey, compiling is just really fast these days.
So Dropbox, you've a great service, but I think we should just keep it casual.
However, last year there were a few privacy issues raised with Dropbox. The first is that there is no client-side encryption. That means that although your data is encrypted on their servers, this can be decrypted by them also. Their policy is not to do that, but they *can* be forced to "when legally required to do so". Then there are fond memories of the time that every single account was left accessible without a password for a good few hours. Ouch!
Even though the work I've been sharing on Dropbox isn't super-sensitive (just working papers, data analyses and modelling), this isn't really reassuring. One solution is to use services like SecretSync. However, this requires forking over money for a pretty basic service that Dropbox should be providing by default. So instead I've turned to a geeky option called Mercurial. I've actually been using this for a few years to keep stuff in sync with Steve Lewandowsky when we were writing our masterpiece, and it works really well.
Mercurial is technically known as a distributed version control system. We start off with a seed repository, which contains the initial state of all our files (if any), and initialize it. Then every user of the repository can clone that repository to their own computer so they have local copies of the files. One important thing is that the files in our directory are kept separate from the repository itself. Accordingly, working practice is to edit as per usual, and then every now and then to "commit" those changes to the repository (e.g., at the end of the day, or when you've reached a milestone). However, even after committing those changes, they'll still just be local to your computer. Accordingly, one needs to push their changes to the central repository (which is usually hosted on a server, and talked to via ssh or the like) and pull changes that other users might have made.
One nice thing is the merge feature. If I've made changes to a file, and you've made changes to the same file in the interim, then mercurial can automatically merge those changes in an intelligent fashion. In cases where changes can't be merged (i.e., we modified the same line in the file), we can use a program like kdiff3 to manually resolve the conflicts; in my experience this happens pretty rarely. Another cool thing is that we also have a record of all changes we've made to all files in the repository since the intialization. This has been a lifesaver in cases where, for example, some model simulations have been working really well, but then in overexuberant exploration I've made some silly edits and lost the working version of the model (yes, I can be that silly sometimes).
By default, committing, pulling and pushing all need to be carried out manually. However, I've been using an extension called autosync that will automatically go through a commit/pull/merge/push cycle regularly. Mine fires off every 10 minutes, so my computers are more or less in constant sync. This avoids one problem with Dropbox, which is that it eagerly syncs across all the crappy auxiliary files that are generated when compiling LaTeX documents. I've got my Mac set up to delete all the auxiliary files after completion, so this mostly prevents this happening. Deleting those auxiliary files does mean each compilation takes longer, but hey, compiling is just really fast these days.
So Dropbox, you've a great service, but I think we should just keep it casual.
Saturday, 10 December 2011
First!
Having read a post by Dorothy Bishop on academic blogging, I've been inspired to start up this blog. For a while now I've been meaning to start one up simply to share code and hacks with others, as I've really found others' posts on technical topics really useful. However, as pointed out in the post, blogging is a great way of communicating directly with the public, and for communicating research that wouldn't make it into a journal. So let's give it a go and see what happens...
SPSS to LaTeX (funk to funky)
Some bits of writing a paper can be fun---finding a good way to present an argument, or coming up with a theory that explains some weird results. Other bits are just plain dull, and some of the dullest must be method and results sections. Transcribing F values, p values and degrees of freedom from your stats package into a paper really is not something that researchers should be spending time on. Furthermore, these are the bits that we as psychologists are getting wrong. Some recent and disturbing evidence comes from Bakker & Wicherts (2011), who showed that an alarming number of papers contain errors in their reporting of statistical results (and these errors are biased towards finding a significant result).
One solution is to automate statistical reporting as much as possible, which in turn implies automating the analysis/modelling process itself. Indeed, a recent movement (see, e.g., http://www.sciencemag.org/content/334/6060/1226.full) urges for reproducible research, whereby data are made available to other researchers along with the precise code that was used to turn those data into the results reported in a scientific paper. To link the analyses with what actually appears in a published article, programs such as R have been integrated with LaTeX via such packages as Sweave. I use MATLAB to run many of analyses, and integration with Sweave hasn't really progressed. Furthermore, I actually call SPSS from MATLAB so I get some of the useful stuff that SPSS prints out (like effect sizes and contrasts), whilst using MATLAB to do the grunt work of scoring up performance. So instead I wrote this addin, which takes Excel output exported from SPSS for a repeated measures ANOVA, and turns it into LaTeX-ready output. It would be straightforward to do the same for between-subjects effects, regression etc.---I just very rarely use these!
Bakker, M. & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666-678.
One solution is to automate statistical reporting as much as possible, which in turn implies automating the analysis/modelling process itself. Indeed, a recent movement (see, e.g., http://www.sciencemag.org/content/334/6060/1226.full) urges for reproducible research, whereby data are made available to other researchers along with the precise code that was used to turn those data into the results reported in a scientific paper. To link the analyses with what actually appears in a published article, programs such as R have been integrated with LaTeX via such packages as Sweave. I use MATLAB to run many of analyses, and integration with Sweave hasn't really progressed. Furthermore, I actually call SPSS from MATLAB so I get some of the useful stuff that SPSS prints out (like effect sizes and contrasts), whilst using MATLAB to do the grunt work of scoring up performance. So instead I wrote this addin, which takes Excel output exported from SPSS for a repeated measures ANOVA, and turns it into LaTeX-ready output. It would be straightforward to do the same for between-subjects effects, regression etc.---I just very rarely use these!
Bakker, M. & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666-678.
Subscribe to:
Posts (Atom)