Given the many concerns with focusing only on student ratings of instruction, what are other ways that you and your institution can assess teaching effectiveness and student learning?

What are the concerns with only using student evaluations to assess teaching?

Student evaluations of teaching appear to have limited (if any) correlation with student learning.

  1. Here is a quote from a comprehensive and conservative review on Student Evaluations of Teaching (i.e. SET), from 2013 Review of Educational Research: “This review of the state of the art in the literature has shown that the utility and validity ascribed to SET should continue to be called into question. … many types of validity of SET remain at stake. Because conclusive evidence has not been found yet, such evaluations should be considered fragile, as important stakeholders (i.e., the subjects of evaluations and their educational performance) are often judged according to indicators of effective teaching (in some cases, a single indicator), the value of which continues to be contested in the research literature.”

  2. Here is a link to a 2017 meta-analysis finding zero correlation between learning and student evaluations of instruction.

  3. Here is a 2009 meta-analysis that reached similar but less strong conclusions and is critiqued in the previous link.

  4. 2017 Chronicle of Higher Education: Students don’t always recognize good teaching.

Studies increasingly suggest that student evaluations are biased against female and minority instructors, among others.

  1. Student evaluations of teaching (mostly) do not measure teaching effectiveness

  2. 2016 Inside Higher Education: Bias against female instructors

  3. Here is a striking study with small ‘n’ but big differences in how students rated online instructors, depending simply on if they were told the instructor was male or female: “In promptness, for example, the instructors matched their grading schedules so that students in all groups received feedback at about the same rate. The instructor whom students thought was male was graded a 4.35 out of 5 for promptness, while the instructor perceived to be female received a 3.55.”

  4. Sprague 2016: Inside Higher Education: Working to Reduce the Harm of Bias in Student Course Evaluations

This issue is becoming increasingly recognized, such that some institutions are reconsidering how student evaluations are used. We’ve listed just a few examples below.

  1. Stanford has just created revised course evaluations, due to a call from the Provost, focusing on learning outcomes​.

  2. Purdue’s Senate Faculty Affairs committee made the following recommendation: “Academic units are strongly encouraged not to use student responses to these questions for summative evaluation purposes, i.e. for promotion and tenure decisions.”

  3. University of Michigan strongly recommends using teaching portfolios; they also recommend putting student evaluation numbers into context and not using individual student letters.

What are other ways to evaluate teaching effectiveness?

Interpret and use student rating data appropriately

Linse (2017; summarizes the steps that administrators and faculty evaluation committees should take when using student rating data as a component of teaching evaluations.

  1. Student ratings should be only one of multiple measures of teaching: The most common additional sources of data about the faculty member’s teaching include written student feedback, peer and administrator observations, internal or external reviews of course materials, and more recently, teaching portfolios and teaching scholarship (instructor assessment of teaching effectiveness).  Data collection for each of these additional data sources should be systematic rather than informal.
  2. A faculty member’s complete history of student ratings should be considered, rather than a single composite score.
  3. Small differences in mean (average) ratings are common and not necessarily meaningful: Variations of up to 0.4 points within a course are not unusual, and that of course depends on the rating scale.
  4. Examine the distribution of scores across the entire scale, as well as the mean: The median or the mode is a better measure of central tendency in skewed distributions.
  5. Avoid comparing faculty to each other or to a unit average in personnel decisions: Student ratings instruments are not designed to gather comparative data about faculty. The faculty who are most likely to be negatively impacted by faculty-faculty comparisons are those who do not fit common stereotypes about the professoriate—typically women and faculty of color.
  6. Focus on the most common ratings and comments rather than emphasizing one or a few outlier ratings or comments. Too often, faculty and administrators seem to focus their attention on rare comments, possibly because they are typically the most vehement or the most negative. Evaluators need to be particularly vigilant and self-aware when they are reading or summarizing students’ comments. One of the best ways to ensure that summaries of comments represent students’ views is to sort student comments into groups based on similarity and label the group with a theme, then rank the themes based on the frequency of comments in each. Some common themes include: Labs, Homework, Teamwork, Lecture, Availability, Textbook, and Exams.

Include and give weight to additional component(s) for Teaching Review as part of Promotion and Tenure

The aim here could be to demonstrate efforts to improve teaching in an evidence-based way.

  1. Syllabus with clear learning outcomes and indication of formative (mid-quarter) and summative (final) assessment
  2. Mid-quarter feedback survey
  3. Use of active learning, group work, or other evidence-based teaching practices (ideally with references indicating effectiveness of such practices)
  4. Any other efforts or evidence to indicate improvement in teaching based on education research.

Reward Use of Faculty Development to Support Teaching Improvement

  1. CEILS offers a variety of workshops, journal clubs, and meetings, along with individual consultations.
  2. UCLA’s Office of Instructional Development offers an Instructional Improvement Program set of grants to support faculty in improving teaching.

Reward getting and responding to feedback from faculty peers using evidence-based observation form

Faculty are more likely to improve teaching practices when they use peer feedback for reflection and improvement and/or when peer feedback is based on evidence-based teaching practices.

Analyze shifts in student learning or attitude during class or subsequent performance

  1. Pre- and post-tests/surveys help assess learning gains and attitude shifts associated with course.
  2. Institutional data (contact CEILS or OID for more information) can be used to assess subsequent course performance.
  3. Decrease in drop rates or performance disparities in students from underrepresented groups can indicate improvement. (Other measures can be used to ensure consistent or improved learning.)

Adjust questions in Student Course Evaluations to be evidence-based

  1. Stanford has just created revised course evaluations, due to a call from the Provost, focusing on learning outcomes.
  2. Consider putting scores into context, comparing to similar courses taught by other instructors.
  3. May want to indicate demographics of faculty member in case of possible biases.