Women faculty warn of bias in student evaluations

The Women’s Faculty Cabinet says implicit bias could harm women and POC instructors in the end-of-semester student rating of teaching forms.

The Universitys Student Rating of Teaching form, which students complete to evaluate their instructors at the end of each semester.

Ellen Schmidt

The University’s Student Rating of Teaching form, which students complete to evaluate their instructors at the end of each semester.

Jake Steinberg

When students fill out the end-of-semester Student Rating of Teaching form, it usually signals their stress is almost over. But for the instructors, it can be just the beginning.

Some instructors have to wonder if their physical appearance or skin tone might make a difference when tenure, promotion or a job itself is on the line.

With a growing body of research demonstrating that women and people of color receive lower ratings on student evaluations, the Women’s Faculty Cabinet is calling on the University of Minnesota to use additional measures for assessing a professor’s performance.

The cabinet plans to send a proposal to Executive Vice President and Provost Karen Hanson this week calling for the formation of a diverse task force to come up with what the cabinet calls a more holistic, less-biased way to judge an instructor’s teaching. 

Ole Gram, assistant vice provost for Faculty and Academic Affairs, said students’ implicit biases can influence how they perceive a teacher.

“It’s a kind of error in judgment. It means that when we interact with the world, we have certain schemas in our head,” he said. “For women, often it will be around appearance, clothes, things that are completely irrelevant.”

University policy mandates instructors’ teaching be assessed by students via the SRT and by instructors’ peers through in-class reviews. The WFC says the feedback is helpful for instructors wanting to improve their teaching, but the cabinet doesn’t think it should be used in tenure and promotion reviews.

“This bias is not benign, because SRTs are used in a ‘high stakes’ way,” the proposal says. “Biased SRTs may lead to increased faculty turnover, ultimately harming student learning and the University at-large.”

The WFC is also recommending colleges and departments determine the “gender gap” in their SRT scores and improve and expand their peer review system.

Research demonstrating the potential for bias in teaching evaluations goes back to the 1970s, and recent studies have found similar patterns. A paper published last year analyzed nearly 20,000 student evaluations and concluded women instructors systematically receive lower scores than their male counterparts.

“Our findings have worrying implications for the progression of junior women in academic careers,” the authors wrote. A 2011 study found black faculty receive lower scores than other racial groups, and a study published earlier this year found students give lower scores to non-native English speakers.

University policy says department heads should consider many factors when it comes to making decisions of merit and promotion. “The SRTs are one piece of the puzzle, but they’re not the only piece of the puzzle,” said Vice Provost for Faculty and Academic Affairs Rebecca Ropers.

After three years of research, the WFC found widely varying experiences based on the different colleges instructors teach in — some colleges rely far more on SRT results than others. 

“The policy in place is not what’s actually happening,” said WFC co-chair Amy Simon.

This can have a disparate impact on the most vulnerable instructors, like adjunct instructors who typically work on short-term contracts, she said.

The policy on teaching evaluations says racist or sexist comments shouldn’t be used in assessments. Those judgments can be obvious in written comments, but are less clear in numbered ratings from prompts like “I would recommend this instructor to other students.”

In March, Faculty and Academic Affairs sent a guide to department heads and tenure review committees that says not to focus on minute differences in ratings. The difference between 5.6 and 5.3 could be the result of bias, but the difference between 5.0 and 3.0 is more representative of an instructor’s performance.

“Awareness of our own biases is one of the number one ways to mitigate the bias,” said Stephanie Klein, assistant director of the Office of Measurement Services, which administers the SRT. “Accepting that we have biases gives us that perspective of ‘am I rating the actual educational experience or am I rating my actual dislike for this professor because he sounds like my great uncle Joe who used to pinch my cheeks?'”