Sex, and other -isms of science

It is tempting to think that serious sexism died in the 1970s. These days, overt gender discrimination is unusual, and slightly risky for its perpertrators. But a classic study, conducted in Sweden in 1995, found that sexism (and perhaps less surprisingly, nepotism) had retained a preeminent role in the allocation of scientific jobs and the making-or-breaking of scientific careers.

Christine Wennerås and Agnes Wold analysed the application process for Swedish post-doctoral medical research fellowships that year (they had to make freedom of information requests to get their data). They observed a field of 114 applicants competing for 20 jobs. 46% of the applicants were female, but only 4 of them won positions.

Using multiple regressions, the authors estimated which characteristics of candidates led to high “scientific competence” ratings from the reviewers: what was the relative importance of educational background, publication and citation records,1 the applicant’s gender, the presence of relationships to assessors and other factors?2

The results are shocking. Being female was a major liability: a candidate would need to have 3 extra articles in Nature or Science (or 20 in decent specialist journals) just to counteract the disadvantages she faced for being a woman. There were two women in the pool so prolific that they won post-doc jobs this way, but for most good female scientists there was only one hope for getting a position: knowing someone on the review committee.

Wennerås and Wold measured these personal connections by observing whether a member of the committee recused himself from reviewing the application because he knew the applicant. The presence of such a relationship conferred an advantage of similar size to the advantage of being male.

Sweden has a reputation for some of the most progressive attitudes and policies on gender relations in the world. It is disturbing that despite this, the patriarchy (accidental or otherwise) was still firmly in place in 1995. The most hopeful explanatory hypothesis that Wennerås and Wold offered for their results was the fact that 90% of the application reviewers were male. But it’s hard to say when that will change.


  1. Variables were included for total number of publications, total number of publications weighted by impact factor, first-author publications weighted by impact factor, total citations, and total citations to the candidate’s first-author publications. 

  2. The other factors were letters of recommendation, field of research, foreign nationality, and overseas experience. 

Can humans act utilitarian?

One of the most important schools of ethical thought is consequentialism, which holds that the best actions (or rules, or ways of making decisions) are simply the ones that lead to the best outcomes. When acts are bad (hitting someone with a stick, for instance) it is not because of the deed itself but because of the results that follow — pain, injury, lost friendships. Failing to intercede to prevent something bad from happening to someone else is almost as bad as taking the action yourself.1

The largest branch of consequentialism is utilitarianism. Utilitarians hold that the “best” outcomes are those which are the best for people collectively: “the greatest good for the greatest number”, as Bentham put it.

Utilitarianism calls for two things: altruism and calculation. It tells us, “if you know that the benefit that you would get from this hundred dollar note is less than the benefit that your impoverished friend Susan would get from a second-hand bicycle, you should buy her the bicycle.” And it tells us, “if you know your $100 could save a life in Darfur, you should send it to an humanitarian organisation there instead”. In fact, if lives can be saved for such small amounts, maybe we should be sending more than $100.

A recent study by Deborah Small, George Lowenstein and Paul Slovic demonstrates that, although human beings are capable of altruism, our altruism is in some sense psychologically incompatible with the kind of rational calculation we’d need to perform to be good act-utilitarians.

The experiment by Small et al. shows clearly that human beings2 donate significantly more money to help the victims of catastrophes when two conditions hold: (a) the victim is an identifiable individual, rather than an undetermined individual or a large group in need; and (b) the donor is reasoning emotionally.3

When the experimental subjects were told about the human tendency to donate to indentified individuals in need (rather than large groups in need), they stopped reasoning emotionally. That change halved donations to identified individuals, but did not affect the alread-low donations to groups!

When the authors “primed” some experimental subjects with emotion-based tasks (`how does the word “baby” make you feel?’), and others with mathematical tasks, they observed that the emotionally-primed subjects gave twice as much to identified individuals. Both groups gave similar, low amounts to groups in need.

There are some powerful logical arguments in favour of act-utilitarianism and similar ethical positions. But until we find a way to train, trick, or teach ourselves to live by them, these philosophies will remain incomplete.

Thanks to Toby Ord for suggesting this paper.


  1. From a consequentialist perspective, the main difference between sins of commission (hitting someone with a stick) and sins of omission (failing to stop a branch falling on someone) is that we can’t usually predict events precisely when we aren’t causing them, and we can’t be sure of our ability to prevent them. There are psychological differences too: we might lose a friend for the first action but not the second. 

  2. The results apply to human beings or, at least, to students sitting on their own in a cafeteria at a “University in Pennsylvania”. It would be worth repeating the experiment with other demographics, especially those with more experience of philanthropy. 

  3. Both (a) and (b) were already in the preceeding literature; Small et al. show that altruism increases only when they both hold. 

Who truly governs America’s cities?

Who Governs? is a widely-hailed classic in the field of political science; it was the book that basically made the career of “the Dean of American political scientists”: Robert A. Dahl. In it, Dahl attempts to discover how government really works in America. To do this, he decides to study decision-making in a typical American city — namely, the one outside his office at Yale University: New Haven, Connecticut.

Who Governs? argues that New Haven worked according to Dahl’s theory of “pluralism”: elite political groups exist, but they aren’t very powerful. Instead, they balance each other out, leaving politicians (and thus their voters) firmly in control.

Fifteen years later, the political sociologist G. William Domhoff went over the issues covered in the book (including Dahl’s notes and sources, which Dahl was honest enough to share) only to find that Dahl had badly bungled the research. Upon closer review, Dahl’s own notes, plus a few new sources, revealed exactly the opposite story. This is the story of how Dahl got things so badly wrong.

Finding the elite

Dahl begins by claiming that there’s little overlap between the city’s social and economic elite, part of his argument that different groups of elites balance each other out. So he counts company presidents, individuals with significant property in the city, directors of multiple sizable city firms, and any director of a bank in the city. Then he takes this list and sees how many of them attended the New Haven Lawn Club debutante ball. He doesn’t find much overlap.

Domhoff points out that this is kind of an odd metric. For one thing, not all the elites go to the debutante ball, while many people from out of town do. So instead he says any member of one of New Haven’s three elite social clubs is a social elite, while anyone who’s a director of one of New Haven’s ten most interlocked firms is an economic elite. (Firms are interlocked when they share members of their board of directors.) He finds incredible overlap — of the entire corporate network, 55% are in a social club; of those on two boards, 80% are. So much for that.

Deciding urban renewal

But the bulk of Dahl’s study is his attempt to see who actually makes decisions on three important issues. He picks (arbitrarily) political nominations, public education, and urban renewal. For each, he interviews the major players to find out how the relevant decisions got made. Domhoff points out that political nominations are rather uninteresting, since they’re internal party disputes, and that elites don’t care about public education, since they all live in the suburbs or send their kids to private schools. Which leaves urban renewal.

New Haven went through a massive urban renewal shortly before Dahl’s study and Dahl claims it was orchestrated by the city’s mayor, who heroically fought resistance-to-change on all fronts, selflessly ensuring what was best for New Haven. (The urban renewal project in fact ended up completely destroying New Haven’s downtown, but that’s a separate story.) As Dahl quotes the mayor: “Redevelopment in New Haven began in February of ‘55. We had to start from scratch and assemble a team and start to file all the papers and get the whole program launched.” But Dahh omits a key piece of context.

Urban renewal had in fact been in the works for years, at the insistence of the town’s Chamber of Commerce. When the new mayor took office, the Chamber of Commerce quickly organized a meeting with him at which “the entire program [of urban renewal] would be explained to him and he would be urged to get action started on the program” (as their own minutes described it). A representative met with Mr. Lee at one of the elite social clubs and reported back that “Mr. Lee said he was in entire agreement with [our] program for action.”

So why did Lee claim that he had to start from scratch? Turns out, the city was having trouble getting some of their filings approved, so they decided to try a new strategy and assemble a new team, which begun by refiling all the relevant permits. But this was just a technical detail — the urban renewal plans themselves had long been in the works.

Normally in science, you refute someone’s results by conducting the same form of research yourself under different circumstances. But Domhoff went much further: he reexamined the very same research that Dahl conducted, even using Dahl’s own notes and transcripts. But the conclusions he came to were wildly different. It’s hard to think of a more stunning refutation. Not that political science was interested in hearing it. Dahl remains the field’s idol, while Domhoff is an obscure professor at UC Santa Cruz.

What’s the best way to fight drugs?

In 1994, the RAND Corporation, a major US military think tank, conducted a massive study (with funding from the Office of National drug Control Policy, the US Army, and the Ford Foundation) to measure the effectiveness of various forms of preventing the use of illegal drugs, particularly cocaine.

They analyzed a variety of popular methods and calculated how much it would cost to use each method to reduce cocaine consumption in the US by 1%. Source-country control — military programs to destroy drug production in countries like Peru, Bolivia, and Colombia — are not just devastating to poor third-world citizens; they’re also the least effective, costing $783 million for a 1% reduction. Interdiction — seizing the drugs at the border — is a much better deal, costing only $366 million. Domestic law enforcement — arresting drug dealers and such — is even better, at $246 million. But all of those are blown completely out of the water by the final option: funding treatment programs for drug addicts would reduce drug use by 1% at a cost of only $34 million.

In other words, for every dollar spent on trying to stop drugs through source-country control, we could get the equivalent of twenty dollars benefit by spending the same money on treatment. This isn’t a bunch of hippy liberals saying this. This is a government think tank, sponsored by the US Army.

Evo psych error roundup

An influential group of biologists, psychologists, and other busybodies has for decades promoted the idea that the social sciences should be grounded in the ideas of evolution, that human behavior should be predicted from estimates of what evolution would do. The idea has been heavily promoted from the 1970s, when it was called sociobiology, until today, where it’s called evolutionary psychology (evo psych for short), but little in the way of compelling evidence has been produced. Today, we’ll focus on some less than compelling evidence.

Exhibit A: One common (and characteristically offensive) claim among evopsychers is that your mother’s mother will spend more time caring for you than your father’s mother because — naturally enough — your father’s mother isn’t evolutionarily certain that you have her DNA, since your mother could have been impregnated by any one of tons of guys. The data does indeed seem to bear this out, but sadly this is no win for the evopsychers, since there are some perfectly competent alternative explanations: kids are usually primarily raised by their mothers and its not surprising that those mothers will look to their mothers for help. (via Jeremy Freese)

Exhibit B: In 1995, Christenfeld and Hill argued that since fathers were so unsure if kids were really theirs, evolution would ensure that kids looked more like their fathers than their mothers, so that they wouldn’t be abandoned by deadbeat dads. And, sure enough, they had some students rate whether kids looked more like their father or mother and found that they looked more like their father. Robert French later redid the study, only to find that he couldn’t replicate the results. Oops. (via Mark-Jason Dominus)

Exhibit C: In 1993, Devendra Singh spent months pouring over old copies of Playboy — for science, of course. He set about measuring the waist-to-hip ratios of Playboy models and Miss America winners, concluding that they had maintained relatively constant — approximately .70 — even as the models had gotten thinner over the years. He argued that men were evolutionarily wired to find this “hourglass shape” attractive. The result was quoted in just about every evopsych textbook and news article since. Well, Jeremy Freese and Sheri Meland checked the numbers and found — once again — that none of it was true. There have actually been statistically significant changes in waist-to-hip ratios over time. (original article)

The power of hope

The experiment was stunning in its simplicity. A group of teachers at a low-income South San Francisco elementary school were asked to begin the year by administering the “Harvard Test of Inflected Acquisition” to their students. The results were processed and the teachers were given back a list of students whose intellectual abilities were expected to “bloom” that year. At the end of the school year the test was administered again and, sure enough, the bloomers were found to have bloomed, surpassing the other students. But there was just one catch: the test was actually a simple IQ test and the “bloomers” were actually chosen randomly.

The result was called the “Pygmalion effect”: teachers who expected their students to do better actually caused their students to do better. It was a classic self-fulfilling prophecy. The study (published as Pygmalion in the Classroom) was widely hailed. It made the front page of the Times, The Today Show, the New Yorker, and Time, among others. Teacher workshops in avoiding the effect spread from Puerto Rico to Saudi Arabia. LA banned IQ tests in its elementary schools. Presidents, textbooks, and Wikipedia articles repeat the notions to this day, over 30 years later.

Except none of it was true. The original study was conducted in first through sixth grades. The results were only statistically significant in grades one and two (where the alleged bloomers started with a 4-point advantage). The study was repeated in two Midwestern schools, where as statistically significant advantage was found in favor of the kids who weren’t expected to bloom. Psychologists who reviewed the analysis of the IQ test results found something was badly wrong. Some kids got lower IQ scores than they would have had they just filled out the test randomly. (It turned out the kids just didn’t fill out the test.)

All of this data was available before the Pygmalion book was ever published or promoted. Yet it was glossed over or otherwise ignored by the authors. And even when critics published articles spelling out the details, their critiques have been largely ignored by the public. Harvard’s Robert Rosenthal, the author of the original study, tried four more times to reproduce the effect, failing each time. A handful of studies did reproduce the effect, but they had incredibly small sample sizes. A meta-analysis found no overall effect when sample size was taken into account and showed that nearly half the replications had results that went in the opposite direction.

It might be nice if this Pygmalion effect were real, if students could do better on IQ tests simply by having their teachers think more highly of them. But, as best as we can tell, wishing doesn’t make it so.

What do we learn from lectures?

In 1972, Dr. Myron L. Fox, an authority on the application of mathematics to human behavior, gave a lecture to a group of educators — psychiatrists, psychologists, social workers, education students, and administrators — on the topic of “Mathematical Game Theory as Applied to Physician Education.” He spoke for an hour and took another half hour of questions. According to feedback forms distributed after the lecture, the talk was very well received. “Excellent presentation, enjoyed listening,” commented one. “Has warm manner,” added another. “Good flow, seems enthusiastic.” Not everyone was so positive, though. “Too intellectual a presentation,” complained one. “My orientation is more pragmatic.” Still, the majority of the responses were broadly favorable.

There was, however, a more serious problem. “Dr. Myron L. Fox” was actually an actor trained to give a speech consisting largely of “double talk, neologisms, non sequiturs, and contradictory statements … interspersed with parenthetical humor and meaningless references to unrelated topics.” The speech was actually an experiment conducted by a group of professors of medical education. They summarized the results by noting that “no respondents saw through the hoax of the lecture, [] all respondents had significantly more favorable than unfavorable responses, and [] one even believed he read Dr. Fox’s publications.”

“Given a sufficiently impressive lecture paradigm,” they concluded, “an experienced group of educators participating in a new learning situation can feel satisfied that they have learned despite irrelevant, conflicting, and meaningless content conveyed by the lecturer.”