Leave One Subject Out Cross Validation - The Video

Due to the extraordinary popularity of the leave-one-subject-out (LOSO) post I wrote a couple of years ago, and seeing as how I've been using it lately and want to remember how to do it, here is a short eight-minute video on how to do it in SPM. While the method itself is straightforward enough to follow - GLMs are estimated for each group of subjects excluding one subject, and then estimates are extracted from the resulting ROIs for just that subject - the major difficulty is batching it, especially if there are many subjects.

Unfortunately I haven't been able to figure this out satisfactorily; the only advice I can give is that once you have a script that can run your second-level analysis, loop over it while leaving out consecutive subjects for each GLM. This will leave you with the same number of second-level GLMs as there are subjects, and each of these can be used to load up contrasts and observe the resulting clusters from that analysis. Then you extract data from your ROIs for that subject which was left out for the GLM and build up a vector of datapoints for each subject from each GLM, and do t-tests on it, put chocolate sauce on it, eat it, whatever you want. Seriously. Don't tell me I'm the only one who's thought of this.

Once you have your second-level GLM for each subject, I recommend using the following set of commands to get that subject's unbiased data (I feel slightly ridiculous just writing that: "unbiased data"; as though the data gives a rip about anything one way or the other, aside from maybe wanting to be left alone, and just hang out with its friends):

1. Load up your contrast, selecting your uncorrected p-value and cluster size;
2. Click on your ROI and highlight the corresponding coordinates in the Results windown;
3. Find out what the path is to the contrasts for each subject for that second-level contrast by typing "SPM.xY.P"; that will be the template you will alter to get the single subject's data - for example, "/data/myStudy/subject_101/con_0001.img" - and then you can save this to a variable, such as "subject_101_contrast";
4. Average that subject's data across the unbiased ROI (there it is again! I can't get away from it) using something like "mean(spm_get_data(subject_101_contrast, xSPM.XYZ), 2)";
5. Save the resulting value to a vector, and update this for each additional subject.



Unbiased FMRI Analysis: Leave One Subject Out

Neuroimaging researchers are incessantly bedeviled by the problem of biased region of interest (ROI) analysis. One is constantly lured by the siren song of significant results and large effect sizes radiating from the stygian depths of a non-independent ROI; and while one can at times point toward their use of independent ROIs from other studies, there is always the lurking suspicion that the researcher already knew where the activation was before the ROI was chosen. I have witnessed men, otherwise Samsons in the field and Solomons in counsel, who have had their heads shorn by the harlot of biased analysis.

The most straightforward and appropriate way to do this, of course, is with a region defined on a priori assumptions about where your quarry might lie, based on theory or based on the results of other studies. This ensures that any results extracted from that region are uninfluenced by the model used to generate the statistical maps, therefore circumventing the issue of "double-dipping", or circular analyses (see Kriegeskorte et al, 2009). Another method is to use anatomical regions based on atlases, which again should be motivated by theory.

However, there is yet another option that I was unaware of until recently: Leaving one subject out (LOSO). According to this procedure, non-independence can be mitigated by constructing a general linear model (GLM) with every subject in the study except for one; statistics such as beta weights, time courses, etc., can then be extracted from the resulting parametric map for the subject that was left out, as this subject is no longer contributing to the signal observed in the given region. This process is then repeated and the appropriate parameter extracted for each subject. It is unlikely that there will be perfect overlap between all of the subjects included in each LOSO analysis, but if the effect is real and robust, then it should survive each of these non-overlapping regions.

One consideration with this procedure is what threshold to use for each LOSO analysis. One approach is to hold the p-value constant, in which case a higher t-threshold is used for each analysis due to a reduction in the degrees of freedom. The other approach is to hold the t-value constant, leading to a slightly increased p-value. Both approaches are defensible, although if there are wide variations in the ROI results with each approach, one may want to reconsider the reliability of their finding.

More details can be found in the paper by Esterman et al (2010); I hope this provides the necessary edification and enlightenment for those benighted souls wading about in the filth of their own squalor.