Bayesian Approaches to fMRI: Thoughts

Pictured: Reverend Thomas Bayes, Creator of Bayes' Theorem and Nerd Baller

This summer I have been diligently writing up my qualification exam questions, which will effect my entry into dissertation writing. As part of the qualification exam process I opted to perform a literature review on Bayesian approaches to fMRI, with a focus on spatial priors and parameter estimation at the voxel level. This necessarily included a thorough review of the background of Bayesian inference, over the course of which I gradually became converted to the view that Bayesian inference was, indeed, more useful and more sophisticated than traditional null hypothesis significance testing (NHST) techniques, and that therefore every serious scientist should adopt it as his statistical standard.

At first, I tended to regard practitioners of Bayesian inference as seeming oddities, harmless lunatics so convinced of the superiority of their technique as to come across as almost condescending. Like all good proselytizers, Bayesian practitioners appear to be appalled by the putrid sea of misguided statistical inference in which their entire field had foundered, regarding their benighted colleagues as doomed unless they were injected with the appropriate Bayesian vaccine. And in no place was this zeal more evident than their continual attempts to sap the underlying foundations of NHST and the assumptions on which it rested. At the time I considered these differences between the two approaches to be trivial, mostly because I convinced myself that any overwhelmingly large effect size acquired in NHST would be essentially equivalent to a parameter estimate calculated by the Bayesian approach.

Bayesian Superciliousness Expressed through Ironic T-shirt

However, the more I wrote, the more I thought to myself that proponents of Bayesian methods may be on to something. It finally began to dawn on me that rejecting the null hypothesis in favor of an alternative hypothesis, and actually being able to say something substantive about the alternative hypothesis itself and compare it with a range of other models, are two very different things. Consider the researcher attempting to make the case that shining light in someone's eyes produces activation in the visual cortex. (Also consider the fact that doing such a study in the good old days would get you a paper into Science, and despair.) The null hypothesis is that shining light into someone's eyes should produce no activation. The experiment is carried out, and a significant 1.0% signal change is observed in the visual cortex, with a confidence interval from [0.95, 1.05]. The null hypothesis is rejected and you accept the alternative hypothesis that shining light in someone's eyes elicits greater neural activity in this area than do periods of utter and complete darkness. So far, so good.

Then, suddenly, one of these harmless Bayesian lunatics pops out of the bushes and points out that, although a parameter value has been estimated and a confidence interval calculated stating what range of values would not be rejected by a two-tailed significance test, little has been said about the credibility of your parameter estimate. Furthermore, nothing has been said at all about the credibility of the alternative hypothesis, and how much more believable it should be as compared to the null hypothesis. These words shock you so deeply that you accidentally knock over a nearby jar of Nutella, creating a delicious mess all over your desk and reminding you that you really should screw the cap back on when you are done eating.

Bayesian inference allows the researcher to do all of the above mentioned in the previous paragraph, and more. First, it has the advantage of being uninfluenced by the intentions of the experimenter, the knowledge of which is inherently murky and unclear, but on which NHST "critical" values are based. (More on this aspect of Bayesian inference, as compared to NHST, can be found in a much more detailed post here.) Moreover, Bayesian analysis sheds light on concepts common to both Bayesian and NHST approaches while pointing out the disadvantages of the latter and outlining how these deficiencies are addressed and mitigated in the former, whereas the converse approach is not true; this stems from the fact that Bayesian inference is more mathematically and conceptually coherent, providing a single posterior distribution for each parameter and model estimate without falling back on faulty, overly conservative multiple correction mechanisms which punish scientific curiosity. Lastly, Bayesian inference is more intuitive. We should intuitively expect our prior beliefs to influence our interpretation of posterior estimates, as more extraordinary claims should require correspondingly extraordinary evidence.

Having listened to this rhapsody on the virtues and advantages of going Bayesian, the reader may wonder how many Bayesian tests I have ever performed on my own neuroimaging data. The answer is: None.

Why is this? First of all, considering the fact that a typical fMRI dataset is comprised of hundreds of thousands of voxels, and given current computational capacity, Bayesian inference for an single neuroimaging session can take prohibitively long amounts of time. Furthermore, the only fMRI analysis package I know of that allows for Markov-Chain Monte Carlo (MCMC) sampling at each voxel is FSL's FLAME 1+2, although this procedure can take on the order of days for a single subject, and the results usually tend to be more or less equal to what would be produced through traditional methods. Add on top of this models which combine several levels of priors and hyperparameters which mutually constrain each other, and the computational cost increases even more exponentially. One neuroimaging technique which uses Bayesian inference in the form of spatial priors in order to anatomically constrain the strength and direction of connectivity - an approach known as dynamic causal modeling (DCM; Friston et al, 2003) - is relatively unused among the neuroimaging community, given the complexity of the approach (at least, outside of Friston's group). Because of these reasons, Bayesian inference has not gained much traction in the neuroimaging literature.

However, some statistical packages do allow for the implementation of Bayesian-esque concepts, such as mutually constraining parameter estimates through a process known as shrinkage. While some Bayesian adherents may balk at such weak-willed, namby-pamby compromises, in my experience these compromises can satisfy both the some of the intuitive concepts of Bayesian methods while allowing for more efficient computation time. One example is AFNI's 3dMEMA, which estimates the precision of the estimate for each subject (i.e., the inverse of the variance of that individual's parameter estimate), and weights it in proportion to its precision. For example, a subject with less variance would be weighted more when taken to a group-level analysis, while a subject with a noisy parameter estimate would be weighted less.

Overall, while comprehensive Bayesian inference at the voxel level would be ideal, for right now it appears impractical. Some may take issue with this, but until further technological advances in computer speed or clever methods which allow for more efficient Bayesian inference, current approaches will likely continue.

AFNI Bootcamp: Day 3

Today, Bob began with an explanation for how AFNI’s amplitude modulation (cf. SPM’s parametric modulation) differs from other software approaches. For one, not only are the estimates for each parametric modulation computed, but so is the estimate of the beta itself. This leads to estimates of what variance can be explained by the parametric modulators, above and beyond the beta itself. Then, the beta estimates for those parametric modulators can be carried to the second level, just like any other parameter estimate.

To give a concrete example, take an experiment that presents the subject with a gamble that varies on three specific dimensions: probability of win, magnitude of win, and the variance of the gamble. Let us say that the onset of the gamble occurred at 42 seconds into the run, and that the dimensions were 0.7, 10, and 23. In this case, the onset time of the gamble would be parametrically modulated by these dimensions, and would be represented in a timing file as 42*0.7,10,23.  [Insert AM_Example.jpg here]. Notice that the resulting parametric modulators are mean-centered, here resulting in negative values for probability and variance. The purpose of the amplitude modulation is to see what proportion of the variance in the BOLD response is due to these additional variables driving the amplitude of the BOLD signal; if it is a particularly good fit, then the resulting t-statistic for that beta weight will be relatively high.

Regarding this, 3dREMLfit was mentioned yet again, as Bob pointed out how it takes into account both the beta estimate and the variance surrounding that estimate (i.e., the beta estimate’s associated t-statistic). A high beta estimate does not necessarily imply a high t-statistic, and vice versa, which is why it would make sense to include this information at the group level. However, none of the AFNI developer’s that I talked to definitively stated that 3dMEMA or the least-squares method was preferable; that is entirely up to the user. I examined this with my own data, looking at a contrast of two regressors at the 2nd-level using both OLSQ and 3dMEMA. As the resulting pictures show, both methods show patterns of activation in the rostral ACC (similar to what I was getting with SPM), although 3dMEMA produces an activation map that passes cluster correction, while OLSQ does not. Which should you use? I don’t know. I suppose you can try both, and whatever gives you the answer that you want, you should use. If you use 3dMEMA and it doesn’t give you the result that you want, you can just claim that it’s too unreliable to be used just yet, and so make yourself feel better about using a least-squares approach.

After a short break, Ziad discussed AFNI’s way of dealing with resting state data via a program called RETROICOR. I have no idea what that stands for, but it accounts for heart rate and respiration variability for each subject, which is critically important when interpreting a resting state dataset. Because the relationship between physiological noise –especially heart rate – and the BOLD signal is poorly understood, it is reasonable to covary this activation out, in order to be able to claim that what you are looking at is true differences in neural activation between conditions or groups or whatever you are investigating (although using the term “neural activation” is a bit of a stretch here). Apparently this is not done that often, and neither is accounting for motion, which can be a huge confound as well.  All of the confounds listed above can lead to small effect sizes biased in a certain direction, but can do so consistently, leading to significant activation at the group level that has nothing to do with the participant’s true resting state (although again, the term “resting state” is a bit of a stretch, since you have no idea what the participant is doing, and no tasks to regress to explain the timeseries data). In any case, this is an area I know very little about, but the potential pitfalls seem serious enough to warrant staying away from this unless I have a really good reason for doing it.

AFNI Bootcamp: Day 2


Today was a walkthrough of the AFNI interface, and a tutorial for how to view timeseries data and model fits. Daniel started off with a showcase of AFNI’s ability to graph timeseries data for each stage of preprocessing, and how it changes as a result of each step. For example, after scaling the raw MR signal to a percentage, the values at each TR in the timeseries graph begin to cluster around the value of 100. This is a totally arbitrary number, but allows one to make inferences about percent signal change, as opposed to simply parameter estimates. Since this percent signal change is done for each voxel, as opposed to grand mean scaling in SPM which divides each voxel’s value by the mean signal intensity across the entire brain, it becomes more reasonable to talk in terms of percent signal change at each voxel.

Another cool feature is the ability to overlay the model fits produced by 3dDeconvolve on top of the raw timeseries. This is especially useful when looking all over the brain to see how different voxels correlate with the model (although this may be easier to see with a block design as opposed to an event-related design). You can extract an ideal time series from the X matrix output by 3dDeconvolve by using 1dcat [X Matrix column] > [output 1D file], and then overlay one or more ideal timeseries by clicking on Graph -> 1dTrans -> Dataset#. (Insert picture of this here).

One issue that came up when Ziad was presenting, was the fact that using dmBLOCK as a basis function to convolve an onset with a boxcar function does not take individual scaling into account. That is, if one event lasts six seconds, and another lasts ten seconds, they will be scaled by the same amount, although in principal they should be different, as saturation has not been achieved yet. I asked if they would fix this, and they said that they would, soon. Fingers crossed!

Outside of the lectures, Bob introduced me to AlphaSim’s successor, ClustSim. For those who haven’t used it, AlphaSim calculates how many contiguous voxels at a you need at a specified uncorrected threshold in order to pass a corrected cluster threshold. That is, AlphaSim runs several thousand simulations of white noise, and calculates the extent of uncorrected voxels that would appear at different levels of chance. ClustSim does the same thing, except that it is much faster, and can calculate several different corrected thresholds simultaneously. The uber scripts call on ClustSim to make these calculations for the user, and then write this information into the header of the statistics datasets. You can see the corrected cluster thresholds for each cluster under the “Rpt” button of the statistics screen.

On a related note, ClustSim takes into account smoothing that was done by the scanner before any preprocessing. I had no idea this happened, but apparently the scanners are configured to introduced very low-level (e.g., 2mm) smoothing into each image as it is output. Because of this, the uber scripts estimate the average amount of smoothness across an entire subject in the x, y, and z directions, which are not always the same. Therefore, if you used a smoothing kernel of 4mm, your estimated smoothness may be closer to 6mm. This is the full width at half max that should be used when calculating cluster correction levels in AlphaSim or ClustSim. Another tidbit I learned is that Gaussian Random Fields (SPM’s method of calculating cluster correction) is “difficult” at smoothing kernels less than 10mm. I have no idea why, but Bob told me so, so I treat it as gospel. Also, by “difficult”, I mean that it has a hard time finding a true solution to the correct cluster correction level.

I found out that, in order to smooth within a mask such as grey matter, AFNI has a tool named 3dBlurInMask for that purpose. This needs to be called at the smoothing step, and replaces 3dmerge or 3dFWHMx, whichever you are using for smoothing. This sounds great in theory, since most of the time we are smoothing both across white matter and a bunch of other crap from outside the brain which we don’t care about. At least, I don’t care about it. The only drawback is that it suffers from the same problem as conventional smoothing, i.e. that there is no assurance of good overlap between subjects, and the resulting activation may not be where it was at the individual level. Still, I think it is worth trying out.

The last person I talked to was Gang Chen, the statistician. I asked him whether AFNI was going to implement a Bayesian inference application anytime soon, for parameter estimation. He told me that such an approach was unfeasible at the voxel level, as calculating HDIs are extremely computationally intensive (just think of how many chains, samples, thinning, etc, and then multiply that by tens of thousands of individual tests). Although I had heard that FSL uses a Bayesian approach, this isn’t really Bayesian; it is essentially the same as what 3dMEMA does, which is to weight high-variability parameter estimates less than high-precision parameter estimates. Apparently a true-blue Bayesian approach can be done (at the second level), but this can take up to several days. Still, it is something I would like to investigate more, and to compare results from AFNI to FSL’s Bayesian method, and see if there is any meaningful difference between the two.

AFNI Bootcamp: Day 1


Today was the first day of AFNI bootcamp, and served as an introduction to the software as well as the philosophy behind it. On the technical side, there wasn’t a whole lot that was new, as it was targeted both toward AFNI veterans and newcomers alike. However, the development team hinted at some new tools that they would be presenting later during the workshop.

First, I should introduce the men behind the AFNI software. They are, in no particular order:

-Bob Cox: Founder of AFNI back at the Medical College of Wisconsin in 1993/1994. Is the hive mind of the AFNI crew, and leads the development of new features and tools.

-Rick Reynolds: Specialist in developing “uber” scripts that create the nuts-and-bolts Unix scripts through graphical user interfaces (GUIs). Up until a few years ago, people still made their scripts from scratch, cobbling together different commands in ways that seemed reasonable. With the new uber scripts, users can point and click on their data and onsets files, and select different options for the preprocessing stream. I’ll be covering these more later.

-Ziad Saad: Developer of the Surface Mapper (SUMA) which talks to AFNI and projects 3D volumetric blobs onto a 2D surface. This allows a more detailed look at activation patterns along the banks of cortical gyri and within the sulci, and produces much sexier looking pictures. I will also discuss this more later.

-Gang Chen: Statistics specialist and creator of the 3dMEMA and 3dLME statistical programs. An excellent resource for statistics-related problems after you’ve screwed up or just can’t figure out how you should model your data.

-Daniel Glen: Registration expert and developer of AFNI’s alignment program, align_epi_anat.py.


As I mentioned, the lectures themselves were primarily an introduction to how fMRI data analysis works at the single-subject level. The philosophy driving the development of AFNI is that the user should be able to stay close to his or her data, and be able to check it easily. AFNI makes this incredibly easy, especially with the development of higher-level processing scripts, and the responsibility of the user is to understand both a) what is going on, and b) what is being processed at each step. Using the program uber_subject.py (to be discussed in detail later), a script called @ss_review_driver is generated, which allows the user to easily check censored TRs, eyeball registration, and review the design matrix. This takes only a couple of minutes per subject, and in my opinion is more efficient and more intuitive than clicking through SPM’s options (although SPM’s approach to viewing the design matrix, where one can point and click on each beta for each regressor, is still far better than any other interface I have used).

A couple of observations during the lectures:

-There is a new program called 3dREML (Restricted Maximum Likelihood) that takes into account both the estimate of the beta for each regressor, and its variance. This information is then taken to the second level for the group analysis, in which betas from subjects with high variance are weighted less than subjects with a much tighter variance around each estimate. It is a concept akin to Bayesian “shrinkage”, in which the estimate of a parameter is constrained around a certain estimate if the majority of the data is around that estimate, which attenuates the effect of outliers. The second-level program – a tool called 3dMEMA (Mixed-Effects Meta-Analysis) – uses the results from 3dREML.

-Daniel Glen discussed some new features that will be implemented in AFNI’s registration methods, such as new atlases for monkey and rat populations. In addition, you can create your own atlas, and use that for determining where an activation occurred. Still in the works: Enabling AFNI’s built-in atlas searcher, whereami, to link up with web-based atlases, as well as display relevant literature and theories associated with the selected region / voxel. This is similar to Caret’s method of displaying what a selected brain region is hypothesized to do.

That’s about it for today. Tomorrow will be covering the interactive features of AFNI, including looking at anatomical-EPI registration and overlaying idealized timecourses (i.e., your model) on top of raw data. Hot!