Philosophy, statistics, and behaviorism’s appeal to ignorance

I sometimes wonder why behaviorism still hasn’t moved on to using statistics. On paper, it would have many benefits:

  • We could look scientific to other scientists.
  • We could also communicate with other psychologists to have more impact.
  • We could show behavior analysis to not just be efficacious, but scalable.
  • We could show key findings to be generalizable.

Let’s unpack this a bit. What would this look like?

Let us suppose for a minute that we wanted to use a group design to test a typical behavioral procedure. You have a kid who won’t sit down. Because he won’t sit down, he cannot be at the desk long enough to cover the educational curriculum his teacher has to cover, and more importantly, he won’t learn important life skills as a result. So the objective is clear – we want to increase the amount of time this kid can tolerate sitting down at a desk for. Time Sitting is our dependent variable, measured in seconds.

At the moment, behaviorists have no problem shaping up sitting behavior for one child. To do this, all they have to do is positively reinforce sitting behavior (for example, with cake or access to a toy). Our research question is then, “does rewarding sitting increase sitting behavior?”

But this takes up resources. After all, behavior analysts typically have master’s degrees, and behavior plans need to be implemented consistently to be effective. That’s a lot of time you need to hire the behavior analyst for, and a lot of money that most families don’t have to spare. This is the scalability problem of an idiographic approach. We can be individually effective engineers if we can afford one-to-one support for every child who needs it (and we cannot), but that does not make us scientists.

Having shaped up sitting behavior with the one child, or even ten children(!) who had a similar problem, we might be able to say, beyond reasonable doubt, that these children can now sit. However, we are still faced with some inconvenient questions. “What if these children were going to start sitting more anyway? They are getting older, after all”. This is why we need a control group. This changes our initial research question from, “does rewarding sitting improve sitting behavior compared to leaving kids alone?”

We are also faced with the question of generalizability. What if these ten kids are from ten different countries? What if some are verbal and some are not? What if they have different innate temperaments? Physical conditions? The list goes on, ad infinitum. This is why most psychologists use group designs. If they can get a reasonably large sample of any given population, this tends to cover the array of individual differences found within that population. As a result, we can make generalizations about that population based on the sample. The bigger the sample, the more likely it is that it represents the population from which it was drawn. This solves the generalizability problem.

“But what if there are other unknown factors, like having a sore bum or something else we didn’t even think of, that are unevenly distributed across the experimental and control group?”, I hear you ask. Great question. This is why we use randomization. As a matter of probability, any other confounding variable, whether measured or not, are likley to be approximately evenly distributed across conditions as long as we have a large enough sample. This is why RCTs do take individual contexts into account. That is, they make them non-issues when assessing the effects of an independent variable on a dependent variable.

We are also faced with the issue of why and for whom the intervention works. This is important to address because we want to be able to reliably tailor interventions to individual needs for them to be practically applicable. We also want to avoid naive realism, which is when an intervention that objectively doesn’t work when applied across contexts subjectively seems to work from the therapist’s perspective. The how and why issue is solved by mediation and moderation, respectively.

Going back to our example, we might say that our dependent variable is “time sitting” (measured in seconds), our independent variable is “reward” (providing cake or a toy for sitting versus not providing anything). Of course, rewards will not work equally well for everyone. Maybe rewards work better for people who are hungry? In this case, our moderator variable could be “hunger”. This would allow us to draw conclusions like “rewarding sitting increases sitting behavior compared to not rewarding sitting, as long as people are hungry”. Moderators are about how an intervention works. In this case, edible rewards are effective for promoting sitting behavior compared to not providing rewards work by satisfying hunger.

Obviously we have a problem here – what if the food has low nutritional value? So, our mediator variable could be “nutritional value”, which can be quantified. Mediators explain why an intervention works. In this case, edible rewards are effective for promoting sitting behavior compared to not providing rewards work, by satisfying hunger, via their nutritional properties.

(Don’t get me wrong – I have worked in ABA long enough to know that nutritional value is probably not a realiable mediator here, it’s just for sake of example)

Choosing a mediator versus a moderator versus a regular independent variable is a theoretical issue. Moderators are probably going to be of most interest to behavior analysts with reservations about group designs. “We don’t need to know what works for the average person”, they’ll say. “We need to address needs of individuals“. This is well and good if you’re made of money and have all the time in the world. However, the moderator variables (i.e., individual differences – including biological and social context) are what tell us who the intervention works for, helping to address this commonly raised issue. If intervention X works better for people who are argreeable (for example), then it’s quite fast to measure that and find out whether the intervention is worth a shot in advance.

And does behavior analysis just have a different set of aims and standards? Well, the technical issues mentioned above are just not solved through their avoidance, whether rationalised through armchair philosophy or not. These are real issues. Besides this, it’s just not a good look to try to redefine science itself rather than do something new/difficult. It’s like saying “everyone else is marching out of line except me”. As far as I can tell, there are only scientific advantages to using statistics in the science of behavior analysis more often. And if (and it’s a big if) we have faith in science as the thing that will ultimately produce better tools to help our clients, then there will surely also be practical benefits too.

Can we use technical notation to help to achieve conceptual precision?

I was really happy to finally publish a paper first conceptualised in 2013, which I have worked on intermittently since. In some scientific fields, abstract ideas such as theorems, grammatical rules, and so on, are expressed using technical notation. In my own field of psychology, Relational Frame Theory (RFT) is a particularly useful approach to both explain and manipulate language and cognition with precision, scope, and depth. In the early days, researchers would use technical notation to describe the patterns of adaptive behaviour (called “relational framing”), but that has gotten lost as the field has become more practitioner-oriented and the experimental behaviour analysts have (alas, literally) been dying off. Having spoken to others who have gravitated to RFT over the years – especially the ones who aren’t so interested in ACT therapy – I noticed that a few of us have started to lament how efforts to achieve technical precision with basic experimentation have been sidelined by our colleagues and by other fields. We wanted to create a resource (i) for anyone who wants to revisit it, (ii) to remind people that there are still a few of us left who do basic experimental behaviour analysis, and (iii) to show our cognitive colleagues that the behaviour analytic tradition has indeed accounted for the complexity and generativity of language.

This paper is dedicated to my more senior friends, colleagues, and mentors who sacrifice sexy research in favour of doing more technical legwork.


We inherit anxious behavior. So what?

People who are high in trait Neuroticism are stirred up emotionally more easily and may be more likely to withdraw from challenge/threat situations. With early twin studies, we learned beyond reasonable doubt that this susceptibility to negative emotion is highly heritable (Viken, Rose, Kaprio, & Koskenvuo, 1994), with modern genetics research corroborating early findings in this respect (Goodman et al., 2018; Segerstrom & Smith, 2019).

Recent research has suggested that anxiety and attentional biases to threat stimuli are also highly heritable (Aktar, Bockstaele, Perez-Edgar, Wiers, & Bögels, 2018). That is, we inherit a certain degree of likelihood of quickly directing our attention towards threatening stimuli in preparation for a fight (anger/outrage etc.), flight (running away/ignoring the problem etc.) or freeze (staying very still so the T-Rex doesn’t eat you). However, we don’t just display attentional biases towards actual threat stimuli. My colleagues and I have recently demonstrated that we also have a biased orientation towards stimuli that suggest a possibility of future threat (Gladwin, Möbius, McLoughlin, & Tyndall, 2018).

Image result for anxiety

All in all, we’re a very “touchy” species. That can be adaptive in that it helps us to stay out of danger, or maladaptive as we’re paralysed from moving forward in the world and doing things that are scary but worthwhile.

Due to the high heritability of our susceptibility to negative emotion, it seems that some people will naturally experience more negative emotion than others, independent of our individual circumstances. Therefore, when someone is negatively affected (e.g., anxious, offended, depressed etc.), it does not necessarily mean, in and of itself, that there is something wrong with the world around them. The solution to experiencing negative affect might be to change your environment to make it less scary, or it might be to be braver and meet the challenge head-on. The latter is what behavior therapy trains people to do.


Aktar, E., Bockstaele, B. Van, Perez-Edgar, K., Wiers, R. W., & Bögels, S. M. (2018). Intergenerational Transmission of Attentional Bias and Anxiety. Developmental Science, e12772.

Gladwin, T. E., Möbius, M., McLoughlin, S., & Tyndall, I. (2018). Anticipatory versus reactive spatial attentional bias to threat. British Journal of Psychology.

Goodman, S. J., Roubinov, D. S., Bush, N. R., Park, M., Farré, P., Emberly, E., … Boyce, W. T. (2018). Children’s biobehavioral reactivity to challenge predicts DNA methylation in adolescence and emerging adulthood. Developmental Science, e12739.

Segerstrom, S. C., & Smith, G. T. (2019). Personality and Coping: Individual Differences in Responses to Emotion. Annual Review of Psychology, 70(1), annurev-psych-010418-102917.

Viken, R. J., Rose, R. J., Kaprio, J., & Koskenvuo, M. (1994). A Developmental Genetic Analysis of Adult Personality: Extraversion and Neuroticism From 18 to 59 Years of Age. Journal of Personality and Social Psychology, 66(4), 722–730.


Can psychologists raise intelligence?


Here’s a blog post I wrote for the Association for Behavior Analysis International, the world’s leading organisation for behavioral psychology, with 35,000 members across all its sub-chapters. In this blog, I put my PhD work in context in an attempt to bridge the divide between behavioral psychology and cognitive neuroscience.

“Nonetheless, those smaller studies yielded large effects, with improvements in IQ in the 15-30 point range. Considering that 15 points represents one whole standard deviation in many IQ tests, this seems remarkable. In the context of a broader literature saying that this was virtually impossible, it seemed that behavior analysts might have the tools to climb a little further, even if those effects were to diminish substantially in more stringent tests.”