Daisy Christodoulou on improving assessmen: the key to education reform. I’m clearly going heavy on the assessment today. Delighted to get my first seat of the conference: it’s either very busy or I’m picking the most popular sessions. There are a lot of people sitting on floors, in the gallery, on the steps…
Daisy says, it’s a big claim, but ultimately, not just in education but in a whole range of fields, better measurement leads to improvements in innovations, while bad measurement leads to distortion and unintended consequences. Here are four assessment practices that are being influenced by theory that isn’t quite right.
One. Using prose descriptors to grade work.
These are so entrenched that people think this is assessment. It can be difficult to think outside of this structure. But it is important, because they are neither accurate nor helpful. They don’t give an accurate summative judgment. Daisy demonstrates this with a fraction problem that also appears in her book, showing that ‘can identify which fraction is larger’ is not as simple as it sounds: questions have different difficulties and even small changes have an impact on this. Prose descriptors are not precise enough.
Are they still a useful structure, eg for formative feedback? Daisy thinks no. They are not helpful for that either. Written comments become generic. They can be accurate, but not helpful.
Daisy talks about well-designed multiple choice questions that are designed to tease out common misconceptions. Unambiguously wrong, but plausible distractors. This is a better way of giving feedback.
Took this picture too early: she also recommends Michael Polanyi.
Two. Marking essays using absolute judgment.
Daisy uses the same crimes activity she includes in her book, from Mozer et al: poisoning a barking dog appears on two lists, once with lesser crimes and one with greater crimes, and people asked to score them on a scale of 1-10.
The alternative to absolute judgment is comparative judgment. This will provide a system for this using an algorithm created by teachers rating pairs of essays. Technology is used to enhance human judgment. Daisy mentions that No More Marking will do this. I missed her summary slide.
Three. Thinking of grades as discrete categories.
Someone at the top of a grade will be closer to someone at the bottom of the next grade than they are to someone else at the bottom of their own grade, but this is not reflected in the letter on the page. Daniel Koretz has attacked this in ‘Measuring Up’.
Four. Thinking test scores matter.
In fact, it is the inferences we can make from then that matter. What can we say from the evidence about what the student can do in other contexts? (Wiliam). Test scores are only samples of a wider domain. When the link between the test and the domain breaks, the inferences made are no longer valid. Any exam is only as good as its link to its domain, but this has become distorted over time, in large part due to the high stakes attached to the grades. ‘When a measure becomes a target, it ceases to become a good measure’: Goodhart’s Law.
This lecture was a good overview of Daisy’s book, ‘Making Good Progress?’ so I can recommend reading that if you liked this.