Does it seem to you that there is a new fMRI study every year? If so, you’re wrong - there are in fact hundreds, perhaps thousands. And the good men and women carrying out this research know that each study, by itself, is a small piece of the puzzle; it isn’t likely that any one of them exactly located where, for example, the brain lit up in surprise when O.J. was acquitted. (It was the limbic system.) Add to this the phenomenon of statistical power, in which certain studies may report an effect by virtue of having enough subjects to detect the effect, while another study with relatively fewer subjects is unable to find it.
There are other factors as well - smoothing, normalizing, and any other interpolations of the brain data will warp, distend, and reshape the brain. And although those operations are necessary to increase power and to place all of the brains into a standardized space, at the same time it creates more uncertainty about where activations are located.
Luckily, with the remarkable increase of neuroimaging studies over the past two decades, and the decision by most researchers to normalize their data to the common space of the Montreal Neurological Institute brain templates, we can leverage the number of studies that have been published to visualize where there is significant overlap between the results. If many studies report the same voxels as being significant or having strong effects, then we can be more certain that those voxels really are active. Instead of measuring the variability of individual subjects, we are measuring the variability of individual studies; the total number of studies in our meta-analysis is a type of group analysis.
The idea behind meta-analysis is simple, although, like most ideas that appear simple to us now, it took considerable insight to create. In one of the earliest instances of meta-analysis, English statistician Karl Pearson combined the results from several different studies examining the effect of inoculation on enteric fever, also known as typhoid fever, in different parts of the British Empire. By calculating the correlation between inoculation and mortality (in this scenario, denoting survival rate), Pearson concluded that, on average, there was a noticeable positive correlation across the testing sites; and that this average was probably more informative than looking at any individual site.
To understand why, we should also look at Pearson’s calculation of “probable error” within each site as well - a term which we would today probably call “standard deviation”, or a measurement of the variability within each sample. Not only do the averages within each sample differ from each other, but so do their variabilities; and the Ladysmith Garrison sample even has a negative correlation between mortality and inoculation, against the trend of all of the other samples. In other words, if someone just looked at the Ladysmith Garrison sample and concluded that there was a negligible or negative correlation, he would be misled about the average trend from all of the other samples.
To explain these deviations (or “extreme irregularity”, as Pearson would say), Pearson judged that other factors were possibly at play, such as the environment in which the inoculations were administered - which could mean anything from the local climate’s effects on the durability of the vaccine before the advent of reliable refrigeration, to the competence of the doctors administering the vaccine - or the self-selection bias of the volunteers, which could be a confound. In any case, if we consider that many of these confounds would cancel each other out across many studies, we can assume that the average across the studies is a more reliable indicator of the effect. (For more details on the history of meta-analysis, see this paper by Kevin O’Rourke, and for a more in-depth statistical overview of the Pearson paper, see this article by Harry Shannon.)
Over a century later, we apply these same principles to fMRI studies. Now that we have access to hundreds or possibly tends of thousands of datasets for a given effect, we can average across the data in each voxel in standardized space to generate meta-analysis maps - and to do that, we can choose from several popular software packages and websites, including GingerALE and Neurosynth.org. GingerALE, for example, was designed to be used with brainmap.org, a neuroimaging database that curates datasets and allows the user to download a subset of them which focus on a given cognitive process. Neurosynth, on the other hand, analyzes thousands of imaging maps on the fly, requiring just a few keywords that correspond to whichever effect you are interested in. And for those looking for a tool that can search for rare or even non-existent search terms and phrases in the neuroimaging literature, Neuroquery uses a sophisticated semantic smoothing algorithm to add weight to semantically related terms.
You can learn more about how to use each of these by visiting their respective websites; I have also started to write some tutorials about how to use each of them, which you can find here.