John List on the virtual nonexistence of "stereotype threat" | Blog Posts

The concept of "stereotype threat" is a vastly popular explanation for The Gap. In 2004, I argued that the most plausible explanation for studies finding that if you tell a Designated Victim Group that they are expected to score lower on a low-stakes test requiring mental effort in return for no reward, they will indeed score lower:

Of course, to me as a former marketing executive, there's an obvious alternative explanation of [Claude] Steele's findings: the students figured out what this prominent professor wanted to see, and, being nice kids, they delivered the results he longed for. This happens all the time in market research. After all, this was just a meaningless little test, unlike a real SAT where the students would all want to do as well as possible.

However, an even more cynical interpretation has been floating around on the fringes of public discourse for a number of years:publication bias. Studies that find stereotype threat get published, while studies that don't don't: the File Drawer Effect.

From an interview with John List, Homer J. Livingston professor of economics at the U. of Chicago:

new RF: Your paper with Roland Fryer and Steven Levitt came to a somewhat ambiguous conclusion about whether stereotype threat exists. But do you have a hunch regarding the answer to that question based on the results of your experiment?

List: I believe in priming. Psychologists have shown us the power of priming, and stereotype threat is an interesting type of priming. Claude Steele, a psychologist at Stanford, popularized the term stereotype threat. He had people taking a math exam, for example, jot down whether they were male or female on top of their exams, and he found that when you wrote down that you were female, you performed less well than if you did not write down that you were female. They call this the stereotype threat. My first instinct was that effect probably does happen, but you could use incentives to make it go away. And what I mean by that is, if the test is important enough or if you overlaid monetary incentives on that test, then the stereotype threat would largely disappear, or become economically irrelevant.

So we designed the experiment to test that, and we found that we could not even induce stereotype threat. We did everything we could to try to get it. We announced to them, “Women do not perform as well as men on this test and we want you now to put your gender on the top of the test.” And other social scientists would say, that’s crazy — if you do that, you will get stereotype threat every time. But we still didn’t get it. What that led me to believe is that, while I think that priming works, I think that stereotype threat has a lot of important boundaries that severely limit its generalizability. I think what has happened is, a few people found this result early on and now there’s publication bias. But when you talk behind the scenes to people in the profession, they have a hard time finding it. So what do they do in that case? A lot of people just shelve that experiment; they say it must be wrong because there are 10 papers in the literature that find it. Well, if there have been 200 studies that try to find it, 10 should find it, right?

This is a Type II error but people still believe in the theory of stereotype threat. I think that there are a lot of reasons why it does not occur. So while I believe in priming, I am not convinced that stereotype threat is important.