I sometimes wonder why behaviorism still hasn’t moved on to using statistics. On paper, it would have many benefits:
- We could look scientific to other scientists.
- We could also communicate with other psychologists to have more impact.
- We could show behavior analysis to not just be efficacious, but scalable.
- We could show key findings to be generalizable.
Let’s unpack this a bit. What would this look like?
Let us suppose for a minute that we wanted to use a group design to test a typical behavioral procedure. You have a kid who won’t sit down. Because he won’t sit down, he cannot be at the desk long enough to cover the educational curriculum his teacher has to cover, and more importantly, he won’t learn important life skills as a result. So the objective is clear – we want to increase the amount of time this kid can tolerate sitting down at a desk for. Time Sitting is our dependent variable, measured in seconds.
At the moment, behaviorists have no problem shaping up sitting behavior for one child. To do this, all they have to do is positively reinforce sitting behavior (for example, with cake or access to a toy). Our research question is then, “does rewarding sitting increase sitting behavior?”
But this takes up resources. After all, behavior analysts typically have master’s degrees, and behavior plans need to be implemented consistently to be effective. That’s a lot of time you need to hire the behavior analyst for, and a lot of money that most families don’t have to spare. This is the scalability problem of an idiographic approach. We can be individually effective engineers if we can afford one-to-one support for every child who needs it (and we cannot), but that does not make us scientists.
Having shaped up sitting behavior with the one child, or even ten children(!) who had a similar problem, we might be able to say, beyond reasonable doubt, that these children can now sit. However, we are still faced with some inconvenient questions. “What if these children were going to start sitting more anyway? They are getting older, after all”. This is why we need a control group. This changes our initial research question from, “does rewarding sitting improve sitting behavior compared to leaving kids alone?”
We are also faced with the question of generalizability. What if these ten kids are from ten different countries? What if some are verbal and some are not? What if they have different innate temperaments? Physical conditions? The list goes on, ad infinitum. This is why most psychologists use group designs. If they can get a reasonably large sample of any given population, this tends to cover the array of individual differences found within that population. As a result, we can make generalizations about that population based on the sample. The bigger the sample, the more likely it is that it represents the population from which it was drawn. This solves the generalizability problem.
“But what if there are other unknown factors, like having a sore bum or something else we didn’t even think of, that are unevenly distributed across the experimental and control group?”, I hear you ask. Great question. This is why we use randomization. As a matter of probability, any other confounding variable, whether measured or not, are likley to be approximately evenly distributed across conditions as long as we have a large enough sample. This is why RCTs do take individual contexts into account. That is, they make them non-issues when assessing the effects of an independent variable on a dependent variable.
We are also faced with the issue of why and for whom the intervention works. This is important to address because we want to be able to reliably tailor interventions to individual needs for them to be practically applicable. We also want to avoid naive realism, which is when an intervention that objectively doesn’t work when applied across contexts subjectively seems to work from the therapist’s perspective. The how and why issue is solved by mediation and moderation, respectively.
Going back to our example, we might say that our dependent variable is “time sitting” (measured in seconds), our independent variable is “reward” (providing cake or a toy for sitting versus not providing anything). Of course, rewards will not work equally well for everyone. Maybe rewards work better for people who are hungry? In this case, our moderator variable could be “hunger”. This would allow us to draw conclusions like “rewarding sitting increases sitting behavior compared to not rewarding sitting, as long as people are hungry”. Moderators are about how an intervention works. In this case, edible rewards are effective for promoting sitting behavior compared to not providing rewards work by satisfying hunger.
Obviously we have a problem here – what if the food has low nutritional value? So, our mediator variable could be “nutritional value”, which can be quantified. Mediators explain why an intervention works. In this case, edible rewards are effective for promoting sitting behavior compared to not providing rewards work, by satisfying hunger, via their nutritional properties.
(Don’t get me wrong – I have worked in ABA long enough to know that nutritional value is probably not a realiable mediator here, it’s just for sake of example)
Choosing a mediator versus a moderator versus a regular independent variable is a theoretical issue. Moderators are probably going to be of most interest to behavior analysts with reservations about group designs. “We don’t need to know what works for the average person”, they’ll say. “We need to address needs of individuals“. This is well and good if you’re made of money and have all the time in the world. However, the moderator variables (i.e., individual differences – including biological and social context) are what tell us who the intervention works for, helping to address this commonly raised issue. If intervention X works better for people who are argreeable (for example), then it’s quite fast to measure that and find out whether the intervention is worth a shot in advance.
And does behavior analysis just have a different set of aims and standards? Well, the technical issues mentioned above are just not solved through their avoidance, whether rationalised through armchair philosophy or not. These are real issues. Besides this, it’s just not a good look to try to redefine science itself rather than do something new/difficult. It’s like saying “everyone else is marching out of line except me”. As far as I can tell, there are only scientific advantages to using statistics in the science of behavior analysis more often. And if (and it’s a big if) we have faith in science as the thing that will ultimately produce better tools to help our clients, then there will surely also be practical benefits too.