As the battle moves ahead over SB 191, the teacher evaluation bill, I want to weigh in with some reasons to support it, particularly with the added amendments that lengthen the time frame for implementation. While the bill may not provide the perfect legislative design, it moves us well beyond the current system of, basically, “non-evaluation” of teachers.
It has never been fully clear to me why teachers earn the equivalent of career tenure, after the most minimal showing of competence/continuance over 3 years (and especially so when the research shows that many teachers don’t reach their best level of achievement until 5 years or more). While tenure in higher education has some problems, for the most part it is a rigorous, 7 year process where a fairly high bar is set – and plenty of people are turned down, but only after very careful review of all aspects of a candidate’s record by multiple parties. And, much of the argument for tenure, of course, is based upon the potentially controversial research that professors might engage in, not the teaching part of their job.
The teachers unions have argued that SB 191 moves too fast, too far. The longer implementation time frame should help address the legitimate elements of those concerns – that is, we don’t now have valid and reliable assessments of student learning for a number of subjects and grades. There is a kind of “chicken v. egg” quality to these arguments – supporters saying we won’t generate and test the valid and reliable assessments until we have high stakes decisions ready to be made, and opponents arguing that we need the assessments fully vetted beyond a shadow of a doubt before making any decisions based upon them. While I’m more in the camp of passing the legislation and figuring out how to do the assessments right, it does take time, and money, to do it right.
Economists look at evaluating teachers as partly a statistical “sampling” problem – that is, teachers have a “true type” (good, bad, average, etc.) that districts can’t easily perceive, so we sample their performance. We can sample inputs (resume, college degree, prior training, etc.), outputs (lessons plans, what they can be seen doing in the classroom, how they interact with peer teachers, etc), and outcomes (ideally longitudinal student achievement, but also student advancement, etc.).
While outcomes such as student growth are the place most reformers want to get to, these are still only imperfect samples of what students have learned – students have good and bad days, some students are sick on test day, the same students taking the test in September and May may be a small number due to mobility, how closely are the tests aligned to the actual curriculum, and other factors all influence the outcomes. As Rona Wilensky noted in these pages, a singular focus on test scores leads to real cheating, perhaps curriculum narrowing, and extreme teaching to the test (these problems are all reduced if the assessment tests are really really good).
Similarly, observing teachers is the classroom is sampling – a “poor true type” teacher could become adept at putting on a good show for announced and unannounced observations by a principal or a peer group, but the more samples that are taken, the more likely the evaluations are to be accurate.
If I were a “good“ teacher I would want fair evaluations, a balance of both of outcomes and outputs assessments, that give me the best chance to show that I am effective. The more good sampling that is done (for example, the eight principal visits to teachers’ classrooms in Mike Miles’ plan), the smaller the “confident interval” around the evaluation and the more likely it is fair. Especially if the pay and tenure implications become higher, we want teachers to feel that their evaluation is fair (and we want to prevent truly “bad” teachers from being able to somehow “game” the system to their advantage).
We should also recognize that there are exciting experiments going on, in Colorado, from which we should draw knowledge soon, including Procomp in DPS, Mike Miles’ evaluation and pay reforms in Harrison 2, Eagle County’s TAP program, the Gates MET studies being done in DPS that explicitly link new assessments and videotaped performance of teachers, and others. With the longer implementation time frame, the right lessons can be learned, and applied, from these experiments.
At the same time, this will cost money. I am often struck by the battle lines in Colorado of reformers, who believe we can make radical changes even with the same low (below national average) funding, versus those who advocate for more, and more fair, funding (and are often derided as somehow being “anti-reform”). These things should go together more.
Reform done badly, often because it is done on the cheap, is likely to backfire and slow ultimate progress. For example, teacher pay for performance is not a new idea – it had a major push after 1983’s Nation at Risk report – but it was done poorly, and even in recent experiments in Houston and Hillsborough County Florida, it was abandoned because notably poor teachers were being paid excellence bonuses (plus a few deceased teachers were paid, as well).
So, how about more money and reform working together? “Gifts, donations, and private philanthropy” are probably not enough to produce a very good evaluation system for 50,000 teachers statewide, especially given the need for better assessments and evaluation training (and, at the same time that CAP4K curriculum changes will be implemented, without enough money).
While Alex Ooms provides some interesting ideas in his recent blog post about freeing up current money spent on paying for advanced degrees, ideas that Marguerite Roza and CRPE have championed, it just isn’t realistic to think that those sources of money will be available soon, as they would require enormous changes to the current system, and probably changes to legal and contractual obligations.
Finally, while questioning the realism of these funding sources, I do agree with much of Ooms’ recent blog post – this bill is important and worth a fight, but we shouldn’t delude ourselves too much about the impact, even if it is implemented well and with some real resources. My sense, from the somewhat mixed extant research, is that we will have the ability to distinguish maybe 3 groups of teachers through better evaluations – a small percentage of poor teachers who should not be in the system, a small percentage of really excellent teachers who deserve more autonomy, career paths, and pay, and a very large middle group of solid teachers that is nearly average.
Rewarding the excellent and culling the poor teachers will help our system somewhat – this is by no means trivial. But, we need to figure out how to improve the very large number of more-or-less average teachers to get a major boost to student outcomes.
Popularity: 5% [?]