Establishing Casaulity Between Prediction Errors and Learning

You've just submitted a big grant, and you anxiously await the verdict on your proposal, which is due any day now. Finally, you get an email with the results of your proposal. Sweat drips from your brow and onto your hands and onto your pantlegs and soaks through your clothing until you look like some alien creature excavated from a marsh. You read the first line - and then read it again. You can't believe what you just saw - you got the grant!

Never in a million years did you think this would happen. The proposal was crap, you thought; and everyone else you sent it to for review thought it was crap, too. You can just imagine their faces now as they are barely able to restrain their choked-back venom while they congratulate you on getting the big grant while they have to go another year without funding and force their graduate students to work part-time at Kilroy's for the summer and get hit on by sleazy patrons with slicked-back ponytails and names like Tony and Butch and save money by moving into that rundown, cockroaches-on-your-miniwheats-infested, two-bedroom apartment downtown with five roommates and sewage backup problems on the regular.

This scenario illustrates a key component of reinforcement learning known as prediction error: Organisms tend to associate outcomes with particular actions - sometimes randomly, at first - and over time come to form a cause-effect relationship between actions and results. Computational modeling and neuroimaging has implicated dopamine (DA) as a critical neurotransmitter responsible for making these associations, as shown in a landmark study by Schultz and colleagues back in 1997. When you have no prediction about what is going to happen, but a reward - or punishment - appears out of the blue, DA tracks this occurrence by increasing firing, usually originating from clusters of DA neurons in midbrain areas in the ventral tegmental area (VTA). Over time, these outcomes can become associated with particular stimuli or particular actions, and DA firing drifts to the onset of the stimulus or action. Other types of predictions and violations you may be familiar with include certain forms of humor, items failing to drop from the vending machine, and the Houdini.

Figure 1 reproduced from Schutlz et al (1997). Note that when a reward is predicted but no reward occurs, DA firing drops precipitously.

In spite of a large body of empirical results, most reinforcement learning experiments have difficulty establishing a causal link between DA firing and the learning process, often due to relatively poor temporal resolution. However, a recent study in Nature Neuroscience by Steinberg et al (2013) used a form of neuronal activation known as optogenetics to stimulate neurons with pulses of light during critical time periods of learning. One aspect of learning, known as blocking, presented an opportunity to use the superior temporal resolution of optogenetics to test the role of DA in reinforcement learning.

To illustrate the concept of blocking, imagine that you are a rat. Life isn't terribly interesting, but you get to run around inside a box, run on a wheel, and push a lever to get pellets. One day you hear a tone, and a pellet falls down a nearby chute; and it turns out to be the juiciest, moistest, tastiest pellet you've ever had in your life since you were born about seven weeks ago. The same thing happens again and again, with the same tone and the same super-pellet delivered into your cage. Then, at some point, right after you hear the tone you begin to see light flashed into your cage. The pellet is still delivered; all that has changed is now you have a tone and a light, instead of just the tone. At this point, you begin to get all hot and excited whenever you hear the tone; however, the light isn't really doing it for you, and about the light you couldn't really care less. Your learning toward the light has been blocked; everything is present to learn an association between the light and the uber-pellet, but since you've already been highly trained on the association between the tone and the pellet, the light doesn't add any predictive power to the situation.

What Steinberg and colleagues did was to optogenetically stimulate DA neurons whenever rats were presented with the blocked stimulus; in the example above, the light stimulus. This induced a prediction error that was then associated with the blocked object - and rats later presented with the blocked object exhibited similar learning behavior to that stimulus as they did to the primary reinforcer - in the example above, the tone stimulus - lending direct support to the theory that DA serves as a prediction error signal, rather than a salience or surprise signal. Followup experiments showed that optogenetic stimulation of DA neurons could also interfere with the extinction process, when stimuli are no longer associated with a reward, but still manipulated to precede a prediction error. Taken together, these results are a solid contribution to reinforcement learning theory, and have prompted the FDA to recommend more dopamine as part of a healthy diet.

And now, what you've all been waiting for - a gunfight scene from Django Unchained.



The Noose Tightens: Scientific Standards Being Raised

For those of you hoping to fly under the radar of reviewers and get your questionable studies published, I suggest that you do so with a quickness. A new editorial in Nature Neuroscience outlines the journal's updated criteria for methods reporting, which removes the limit on the methods section of papers, mandates reporting the data used to create figures, and requires statements on randomization and blinding. In addition, the editorial board takes a swipe at the current level of statistical proficiency in biology, asserting that

Too many biologists [and neuroscientists] still do not receive adequate training in statistics and other quantitative aspects of their subject. Mentoring of young scientists on matters of rigor and transparency is inconsistent at best. In academia, the ever-increasing pressures to publish and obtain the next level of funding provide little incentive to pursue and publish studies that contradict or confirm previously published results. Those who would put effort into documenting the validity or irreproducibility of a published piece of work have little prospect of seeing their efforts valued by journals and funders; meanwhile, funding and efforts are wasted on false assumptions.

What the editors are trying to say, I think, is that a significant number of academics, and particularly graduate students, are most lazy, benighted, pernicious race of little odious vermin that nature ever suffered to crawl upon the surface of the earth; to which I might add: This is quite true, but although we may be shiftless, entitled, disgusting vermin, it is more accurate to say that we are shiftless, entitled, disgusting vermin who simply do not know where to start. While many of us learn the basics of statistics sometime during college, much is not retained, and remedial graduate courses do little to prepare one for understanding the details and nuances of experimental design that can influence the choice of statistics that one uses. One may argue that the onus is on the individual to teach himself what he needs to know in order to understand the papers that he reads, and to become informed enough to design and write up a study at the level of the journal for which he aims; however, this implies an unrealistic expectation of self-reliance and tenacity for today's average graduate student. Clearly, blame must be assigned: The statisticians have failed us.

Another disturbing trend in the literature is a recent rash of papers encouraging studies to include more subjects, to aid both statistical reliability and experimental reproducibility. Two articles in the last issue of Neuroimage - One by Michael Ingre, one by Lindquist et al - as well as a recent Nature Neuroscience article by Button et al, take Karl Friston's 2012 Ten Ironic Rules article out to the woodshed, claiming that small sample sizes are more susceptible to false positives, and that instead larger samples should be recruited and effect sizes reported. More to the point, the underpowered studies that are published tend to be biased to only finding effects that are inordinately large, as null effects simply go unreported.

All of this is quite unnerving to the small-sample researcher, and I advise him to crank out as many of his underpowered studies as he can before larger sample sizes become the new normal, and one of the checklist criteria for any high-impact journal. For any new experiments, of course, recruit large sample sizes, and when reviewing, punish those who use smaller sample sizes, using the reasons outlined above; for then you will have still published your earlier results, but manage to remain on the right side of history. To some, this may smack of Tartufferie; I merely advise you to act in your best interests.