Stars and numbers: the question of evaluation

The question of marking schemes is a recurring topic in academic debates, where the key word is evaluation. In this exam period, a brief reflection on the art of grading.

By: Daniel Jutras

Date: December 20, 2022

There was a little dustup in the cultural pages of La Presse+ a couple of weeks ago. Theatre director Ren Richard Cyr railed against the rating scale for art critics adopted by the major daily and denounced a 鈥減aternalistic,鈥� 鈥渓aughable,鈥� and 鈥渋nfantilizing鈥� system. And that鈥檚 because La Presse+ abandoned its five-star rating grid and instead chose to use a decimal scale. Films, novels, plays, and albums are now rated from 1 to 10. Cyr, who was already not particularly fond of the five-star system, reminds us that his job is to 鈥渋nvent worlds where we want to believe that everything is still possible and where sixteen divided by three equals a thousand suns.鈥� Instead, he invites the critics to describe works 鈥渦sing analysis, intelligence and sensitivity, with words, impressions and ideas.鈥� To hell with rating on a scale of 10!

You鈥檇 swear you were at a departmental assembly. The question of grading schemes is indeed a recurring topic in academic debates, where the key word is evaluation - evaluation of exams, assignments, manuscripts for publication, promotion dossiers, programs, teaching and so on.

Let鈥檚 look at the evaluation of exams and assignments. The rest would detract from my point.

Ren Richard Cyr is quite right: all grading grids are reductive. But they still have a meaning, which may vary depending on the recipient. So here鈥檚 a first question: What鈥檚 the meaning conveyed by the grade? In the academic world, the awarding of a grade, whether it鈥檚 a percentage, a letter, or an honour, serves two distinct but related purposes.

A grade is primarily feedback鈥攄efinitive, monolithic, crude鈥攐n the achievement of certain learning objectives. Apart from some specific contexts, the mark evaluates the result rather than the effort (a nuance that is not always well understood). It can be a more or less precise range, from a binary statement (success/failure) to a percentage scale and everything in between (more or fewer letters and more or fewer pluses and minuses).

The other purpose of grading is comparison. The mark given places each 鈥減erformance鈥� on a scale that situates it in relation to others, with a desirable but variable degree of accuracy and objectivity. In an ideal world, the grading grid would allow each individual to situate themselves in relation to the group鈥攗seful information in a learning journey鈥攚ithout having their position on the scale shared with others. Nobody likes bad grades: not artists, not restaurants, not students. But bad grades hurt even more when they are used as the basis for decisions made by people other than the one being evaluated: a potential client or viewer, a graduate school admissions committee or a potential employer.

There鈥檚 a clear tension between these two aims of evaluation. Grades and their distribution on a curve provide clear, simple, and immediately usable information for third parties. As a professor, am I accountable to these third parties? Do I have to worry about how they receive and use this information? Conversely, when it is intended for the person being assessed, the grade alone does not provide sufficiently informative feedback. So, in an environment where multiple choice exams were essentially unheard of and consequently essay questions were commonplace, as a young professor I spent a lot of time constructing grading grids that made it possible to distinguish a B from a B-. It was a waste of time. Students filed into my office to find out more. The conclusion, not surprisingly, was that feedback in 鈥渨ords, impressions and ideas鈥� to use Ren Richard Cyr鈥檚 words, is more telling when it explains what鈥檚 wrong with an essay. But this requires time and resources that aren鈥檛 always available, especially if the group consists of dozens of people, all wanting to know exactly why they didn鈥檛 do well.

Over the years, I have come to the conclusion that feedback 鈥渋n words鈥� is more important than marks, though my students didn鈥檛 always agree with me. I鈥檓 aware that grades have consequences and should not be given carelessly or cavalierly. But I chose to pay greater attention to the close relationship between evaluation and learning. By making sure that I assessed the skills and knowledge actually and explicitly used in my course. Disclosing my evaluation grid and the relative weight given to each element ahead of time. Giving each person a paper version of this grid, annotated with the contents of their own examination booklet. My group sizes varied, from about 15 people in a seminar to large groups of close to 200 in a required course. In doing this, I devoted many hours and days around Christmas as well as beautiful days in May. I didn鈥檛 please everyone, but it was worth the effort. Through trial and error, I usually鈥攖hough not always鈥攎anaged to fulfil my fundamental responsibility of explaining their successes and failures to each person I taught.

In responding to Ren Richard Cyr, some argued that awarding a rating out of 10 was not intended for the artists but rather for all those people looking for a way to choose from among the many shows on offer. This, in my opinion, is where the jobs of professor and theatre critic part ways. Teachers should avoid worrying too much about the other 鈥渦sers鈥� of the information conveyed by transcripts. To those of you who devote many hours to reading, evaluating, and commenting on papers, theses, and other exams, I raise my hat. This will aways be the most difficult aspect of your chosen career.

P.S. The play directed by Ren Richard Cyr received a rating of 8.5 out of 10. If it鈥檚 restaged, don鈥檛 miss it!

Daniel Jutras

If you鈥檇 like to continue the conversation, please drop me a line.

All communications