Return to All Teaching Guides

What are the concerns with only using student evaluations to assess teaching?

  • Here is a quote from a comprehensive and conservative review on Student Evaluations of Teaching (i.e. SET), from 2013 Review of Educational Research: “This review of the state of the art in the literature has shown that the utility and validity ascribed to SET should continue to be called into question. … many types of validity of SET remain at stake. Because conclusive evidence has not been found yet, such evaluations should be considered fragile, as important stakeholders (i.e., the subjects of evaluations and their educational performance) are often judged according to indicators of effective teaching (in some cases, a single indicator), the value of which continues to be contested in the research literature.”

  • Toftness et al. (2017) found that “Instructor fluency leads to higher confidence in learning, but not better learning” due to an “illusion of learning” associated with lecture-based learning.
  • Kornell & Hausman (2016) found an inverse correlation between student ratings and subsequent course performance.
  • Here is a link to a 2017 meta-analysis finding zero correlation between learning and student evaluations of instruction.

  • Here is a 2009 meta-analysis that reached similar but less strong conclusions and is critiqued in the previous link.

  • 2017 Chronicle of Higher Education: Students don’t always recognize good teaching.

  1. Student evaluations of teaching (mostly) do not measure teaching effectiveness

  2. 2016 Inside Higher Education: Bias against female instructors

  3. Here is a striking study with small ‘n’ but big differences in how students rated online instructors, depending simply on if they were told the instructor was male or female: “In promptness, for example, the instructors matched their grading schedules so that students in all groups received feedback at about the same rate. The instructor whom students thought was male was graded a 4.35 out of 5 for promptness, while the instructor perceived to be female received a 3.55.”

  4. To explore “gendered language” in teaching reviews, this interactive chart lets you see the frequency of various words used to describe male and female teachers in about 14 million reviews from
  5. Sprague 2016: Inside Higher Education: Working to Reduce the Harm of Bias in Student Course Evaluations

  1. Stanford has just created revised course evaluations, due to a call from the Provost, focusing on learning outcomes​.

  2. Purdue’s Senate Faculty Affairs committee made the following recommendation: “Academic units are strongly encouraged not to use student responses to these questions for summative evaluation purposes, i.e. for promotion and tenure decisions.”

  3. University of Michigan strongly recommends using teaching portfolios; they also recommend putting student evaluation numbers into context and not using individual student letters.

CEILS Teaching Evaluation Symposium

Recap: CEILS Teaching Evaluation Symposium

CEILS hosted a symposium at UCLA on June 12,  2018, called “Exploring Practical Ways to Inspire and Reward Teaching Effectiveness and Instructional Innovation”. The event details can be found here. Several visiting speakers, including Emily Miller, Associate Vice President for Policy at AAU, Sierra Dawson, Associate Vice Provost for Academic Affairs at the University of Oregon, and Diane O’Dowd, Vice Provost for Academic Personnel at UC Irvine, shared resources on student ratings of instruction, peer teaching observations, and self-assessment of teaching practices, among others.  Many thought leaders from the UCLA community also participated as panelists, moderators, and participants throughout the day. Please explore the resources shared by our colleagues.

  • Click here to access the UCLA Box folder with handouts, rubrics, guidelines, and other materials shared during the symposium.  A password is required to access the Box folder.  Please email us at to request the password.
  • Click here to view the spreadsheet with a list of the documents and Box folder locations.

CEILS also hosted visiting Scientific Teaching Scholar Philip Stark, Professor Statistics and Associate Dean of Mathematical and Physical Sciences at UC Berkeley, who gave a talk on November 2, 2018, entitled “Student Evaluations of Teaching: Managing Bias and Increasing Utility”.  Resources shared at this event can be downloaded from the event page found here; these include slides from his talk, UC Berkeley’s guide for documenting teaching effectiveness and their guide to peer review of course instruction.  We encourage you to check out these and our growing list of resources.

Additional resources

  • This handout from Georgia Tech summarizes the key ways one can evaluate and document teaching effectiveness, with an accompanying report to provide more background.
  • UC Berkeley has a similar set of recommendations in terms of more comprehensively documenting teaching effectiveness.

Our self-assessment (INSERT GUIDE LINK) guide provides an array of tools faculty use to self-assess their teaching techniques, get ideas for growth, and then track improvement in the use of evidence-based approaches.


Linse (2017; summarizes the steps that administrators and faculty evaluation committees should take when using student rating data as a component of teaching evaluations.

  1. Student ratings should be only one of multiple measures of teaching: The most common additional sources of data about the faculty member’s teaching include written student feedback, peer and administrator observations, internal or external reviews of course materials, and more recently, teaching portfolios and teaching scholarship (instructor assessment of teaching effectiveness).  Data collection for each of these additional data sources should be systematic rather than informal.
  2. A faculty member’s complete history of student ratings should be considered, rather than a single composite score.
  3. Small differences in mean (average) ratings are common and not necessarily meaningful: Variations of up to 0.4 points within a course are not unusual, and that of course depends on the rating scale.
  4. Examine the distribution of scores across the entire scale, as well as the mean: The median or the mode is a better measure of central tendency in skewed distributions.
  5. Avoid comparing faculty to each other or to a unit average in personnel decisions: Student ratings instruments are not designed to gather comparative data about faculty. The faculty who are most likely to be negatively impacted by faculty-faculty comparisons are those who do not fit common stereotypes about the professoriate—typically women and faculty of color.
  6. Focus on the most common ratings and comments rather than emphasizing one or a few outlier ratings or comments. Too often, faculty and administrators seem to focus their attention on rare comments, possibly because they are typically the most vehement or the most negative. Evaluators need to be particularly vigilant and self-aware when they are reading or summarizing students’ comments. One of the best ways to ensure that summaries of comments represent students’ views is to sort student comments into groups based on similarity and label the group with a theme, then rank the themes based on the frequency of comments in each. Some common themes include: Labs, Homework, Teamwork, Lecture, Availability, Textbook, and Exams.


  1. CEILS offers a variety (LINK EVENTS PAGE) of workshops, journal clubs, and meetings, along with individual consultations (LINK CONSULTATIONS PAGE).
  2. UCLA’s Office of Instructional Development offers an Instructional Improvement Program set of grants to support faculty in improving teaching.

Faculty are more likely to improve teaching practices when they use peer feedback (LINK TO PEER FEEDBACK GUIDE) for reflection and improvement and/or when peer feedback is based on evidence-based teaching practices.

The aim here could be to demonstrate efforts to improve teaching in an evidence-based way.

  1. Syllabus with clear learning outcomes (LINK TO SYLLABUS LEARNING OUTCOMES GUIDE) and indication of formative (mid-quarter) and summative (final) assessment
  2. Center for Education Innovation and Learning in the Sciences CEILS Mid-quarter Course Evaluation
  3. Use of active learning, group work, or other evidence-based teaching practices (Link to all Guides) (ideally with references indicating effectiveness of such practices)
  4. Any other efforts or evidence to indicate improvement in teaching based on education research.
  1. Pre- and post-tests/surveys help assess learning gains and attitude shifts associated with course.
  2. Institutional data (contact CEILS or OID for more information) can be used to assess subsequent course performance.
  3. Decrease in drop rates or performance disparities in students from underrepresented groups can indicate improvement. (Other measures can be used to ensure consistent or improved learning.)
  1. Stanford has just created revised course evaluations, due to a call from the Provost, focusing on learning outcomes.
  2. Consider putting scores into context, comparing to similar courses taught by other instructors.
  3. May want to indicate demographics of faculty member in case of possible biases.