Complaints about some evaluators not fully understanding our true value … a sense that points were taken away unfairly, despite reviewer training in the appropriate rubrics …. evaluators not understanding, and not crediting us, for the things we do well… a sense that someone in a higher position should reverse the injustice. It all feels unfair.
Yes, but, most of these Colorado complaints about the round two R2T scoring could also be applied to premature teacher evaluation based upon the inappropriate use of faulty test score data.
Isn’t there some irony in the fact that some of the folks complaining about unfair R2T scoring of Colorado’s application are also among the ones who turned a deaf ear to, or brushed aside, some of the legitimate concerns about using current test scores to evaluate teachers?
My colleague Robert Reichardt made a similar point in April, after Colorado lost round 1 of R2T. Now we feel twice the pain.
Let me be clear. I support better teacher evaluation and we need to move in that direction, using multiple measures of better and more frequent principal and peer evaluation, and some appropriate use of student test scores.
There are certainly some individuals and groups who have looked for any reason not to advance real teacher evaluation, because they want to preserve the status quo (which is basically no useful teacher evaluation), and I don’t want to support that position. At the same time, there are lots of others who see legitimate problems with the current technology that ties student test results to specific teacher evaluations, and want to proceed carefully, in order to do this right. I was surprised how little attention policy makers gave to that latter group this spring.
As the implementation of SB 191 moves forward into the implementation stage, but now without federal funding to support it, we should keep these concerns in mind.
There are at least four reasons why we can’t now validly and reliably link teacher evaluations to student test scores. When we address some of these elements, we will be able to more fairly and more effectively evaluate teachers.
First, we don’t have good value-added tests. A annual March CSAP test is not good enough (you need a valid beginning and end of year test to the same students whose gain you want to assess), and more than half of Colorado grades/subjects don’t even have the annual CSAP available anyway.
Second, students are probably not randomly assigned to teachers, as this evaluation processes requires. If teacher Jane is known by her principal to be good at teaching students with serious family problems, and thus gets assigned a group of difficult students, and moves their knowledge forward by 0.75 grade levels, while teacher Joan is known to not be good with difficult students, and gets all of the easier ones, and advances their knowledge by 1.0 grade level, who has done a better job? (It isn’t clear that we can, or want to, “fix” this, but it is a reality that skews the data).
Third, one year of data is not a large enough sample to use for a teacher – you probably need 3. Classes of 26 students, with 50% mobility levels that are not uncommon in urban areas, leave 13 students with a particular teacher all year – that is not enough data to make a reliable judgment about teacher quality.
Fourth, lots of good teaching is joint and collaborative, especially at the secondary level. The social science teacher may be as responsible for improved student writing as is the English teacher. We don’t want teaching to only be a solitary practice with no sharing and collaboration.
Added to these concerns, making student test scores very high-stakes will greatly increase the likelihood of outright cheating, as well as more subtle “teaching to the test” (and not the good kind, where people teach the subjects they are supposed to teach, but the overly narrowing kind where you only ask the types of questions known to be on the test).
I won’t try to make this post double-ironic, but among the beauty of Denver’s own ProComp is that it was put together by and with teachers, and advanced by a teacher vote, and it incorporates multiple measures, to recognize that we can’t really nail down a single dimension of teaching to assess and reward. It is disappointing that we couldn’t summon that kind of process at the state level.
To see a different way of handling this issue, Chad Aldeman of the Quick and Ed blog (a strongly pro-reform voice) recently contrasted LA’s handling of teacher data with Tennessee’s approach:
“In contrast, Tennessee has been using a value-added model since the late 1980’s, and every year since the mid-1990’s every single eligible teacher has received a report on their results. When these results were first introduced, teachers were explicitly told their results would never be published in newspapers and that the data may be used in evaluations. In reality, they had never really been used in evaluations until the state passed a law last January requiring the data to make up 35 percent of a teacher’s evaluation. This bill, and 100% teacher support for the state’s Race to the Top application that included it, was a key reason the state won a $500 million grant in the first round.”
Popularity: 5% [?]