Can Good Teaching Be Measured By a Formula?
What is a good teacher? I think most of us could probably list the qualities possessed by the teachers who have touched our lives most profoundly. My own Top 10 List of the characteristics of a good teacher looks something like this:
- Inspiring and motivating
- Accepts and values each student as a unique learner
- Makes teaching relevant
- Creates a classroom community of learners
- Makes learning both understandable and fun
The larger question is, how do we know? Is there a useful way to observe or quantify what actually makes a good teacher?
That question was the central topic of a lecture I recently attended, given by C. Kirabo Jackson, Northwestern University professor of human development and social policy. Jackson made the case for evaluating teachers based on outputs, much like other professions measure the effectiveness of their members. In other words, what’s “good” is what produces data that proves success. Good doctors have patients that live longer. Good lawyers win the most cases. Good businessmen make the most money.
Though my experience as an educator puts me at odds with his analysis, Jackson's basic premise holds that there are indeed measurable outputs that define good teachers, including:
- How much of the content have their students mastered?
- How well prepared are their students for life situations?
- How much have they increased their students' stock of human capital (the skills needed to be a productive member of our labor market and society)?
He also contends that there are measurable outputs that define a student’s success in life:
- Criminality (or lack thereof)
- Mental and physical health
Since the audience was largely composed of retired educators, I could hear a collective sigh and even an undercurrent of irritation as Jackson presented his initial argument. How did Jackson think teachers alone could impact all of these outcomes, many of which lie decades away from the students in a given teacher’s classroom? How can one suggest that elements like home environment, poverty, violence, homelessness, and hunger do not have a greater impact on these students?
To his credit, Jackson accepts there are many factors beyond a teacher’s control that determine student outcomes. But he is also a data-driven academic who fundamentally believes that student test scores, combined with some fancy math, can correct for these other factors to produce a fair and accurate way to evaluate teachers. In short, he believes in the efficacy of value-added measure (VAM) scores.
How Does VAM Work?
In case you’re not familiar with this model of teacher evaluation, here’s some shorthand: VAM combines test scores with other measurable data to determine who is a good teacher. According to the site VAMboozled, today “40 states and the District of Columbia require objective measures of student learning to be included in educator evaluations—a sea change from just five years ago (Doherty & Jacobs/National Council on Teacher Quality, 2013). Most states use either some type of value-added model (VAM) or student growth percentile (SGP) model to calculate a teacher’s contribution to student score changes.”
Here’s what one VAM model looks like:
VAM uses a mathematical formula to project how students will perform on their year-end tests. Other data such as ethnicity, gender, parental level of education, socio-economic status based on zip codes or qualifying for free/reduced price lunches, home language, and special needs are also somehow factored into the formula. If most of the students perform better or worse than projected by the formula, that supposedly tells us how effective (or not) the teacher is.
But here’s the thing: a teacher’s VAM scores “tend to correlate” year after year—but they don’t always. There have been numerous stories about award-winning teachers whose VAM scores the next year pegged them as bad teachers. There are teachers whose VAM scores are low because their gifted students were projected via the formula to score higher than the highest score possible on the test. There are many teachers who are penalized with low VAM scores because they are more open to having children with challenges in their classrooms.
According to Jackson, the final report of the Measures of Effective Teaching (MET) Project, produced by the Bill & Melinda Gates Foundation, found correlations between VAM scores, classroom observations by principals, and parent or student feedback. If that’s the case, why bother with VAM? Turns out it’s cheaper to let the computers do the judging once the infrastructure is in place: states have access to reams of data and their computers can easily be programmed to crank out VAM scores for teachers. This is a less subjective and much easier task than training personnel to observe and rate teachers.
In a study published last year by Chetty, Friedman, and Rockoff, data from 2.5 million students, collected from 1991-2009, was used to show how teachers who raise test scores also improve life outcomes for kids. The researchers looked at earnings, college attendance, teen births, and quality of neighborhood. They concluded that teachers with VAM scores at the 85% percentile and above added value to the economy by raising each classroom’s lifetime earnings by $266,000.
Diane Ravitch calls this study “nonsense,” and questions the basic premise that VAM scores and tests can measure such things in a meaningful way. And yet, the study is cited again and again as an argument for merit pay and firing teachers with low VAM scores.
Other researchers have looked at data on suspensions, absences, course grades, on-time grade progression, and drop-out rates, and found that non-cognitive factors (like motivation, adaptability, self-control, conscientiousness and creativity) are far more predictive of success in school and in life than test scores, further proving that VAM scores provide only one part of the picture. And yet, because they are easier to measure using computers programmed with a mathematical formula, and because they are cheaper than other methods of identifying good and bad teachers, they have become widely used to evaluate teachers.
My Own Case Against VAM
As a long-time educator, I have grave concerns about using student test scores to measure good teaching. Here are a few reasons I think the approach is a mistake:
- The premise that all other professionals are best judged by their output statistics is deeply flawed. I used this approach in choosing the “best” surgeon to operate on my back, but found that his lack of empathy, and inability to listen, actually impeded my recovery. A lawyer who wins the most cases but is disrespectful to his clients is not the best choice in most situations. Businesses that are successful in terms of profit but mistreat employees and customers (think airlines and Comcast) are not necessarily “good.”
- VAM scores can be highly variable and unstable. Remember the “worst” teacher in New York City (and this was printed in the newspapers to compound her humiliation)? She was a teacher of new immigrants who left her class as they became proficient in English. Her principal believed she was an excellent teacher, but the math just didn’t add up. Not to mention the teacher whose VAM score was low because of students who tested in the 99th percentile and thus could not meet their projected improvement because they had topped out on the test already.
- VAM punishes teachers who have more English language learners and children with special needs in their classes. There are actually good teachers who are a better fit for these students and welcome having more of them in their classroom. They just won’t look like good teachers because many of their students don’t test well.
- Using zip codes to determine economic success is highly flawed. My own zip code is a perfect example, as it includes a huge economic spectrum from very wealthy families to families living in poverty.
- The data mining used for these studies should frighten us all. Researchers have no problem finding out my income, education level, zip code, employment record, criminal record, etc and making assumptions about who I am based on this data. They can access similar data and make similar assumptions for any given student.
- None of the qualities of a good teacher on my initial top 10 list are easy to measure. Does that mean we should forget about these things and only care about the things that are easy measure?
There’s got to be a better way.
Back in November 2013, I asked, “Is There a Fair Way to Evaluate Teachers?” Then, as now, I believed that test scores are neither the best nor most accurate indicator of quality teaching.
I'm not alone in this opinion. Researchers back me up. Stanford University professor Linda Darling-Hammond and other leading education research experts argue that tying the evaluation of teachers to standardized testing is highly unfair and inaccurate. Diane Ravitch concurs in her book, Reign of Error, pointing out that because students who have special needs or are English language learners bring down test scores, teachers who choose to teach our most vulnerable children are punished.
A report from the National Education Policy Center, "Can We Reverse the Wrong Course on Data and Accountability," concludes:
“Expertise has no algorithm. Wisdom does not manifest itself on a spreadsheet. Numbers must be the servant of professional knowledge, not its master. Educators can and should be guided and informed by data systems; but never driven by them.”
After being a teacher for seven years and supervising teachers for 25 years, I still firmly believe that teaching is a calling and good teaching is an art. As wonderful as it may seem to find an objective mathematical formula for evaluating educators, people who have worked with teachers over a period of time will tell you they can identify good ones by watching them with children. Parents and children can always identify them—they just know.
People don’t enter the teaching profession to become wealthy, gain prestige, or come home feeling serene at the end of the day. They arrive long before the first bell rings to prepare, and leave much later in the day than anyone imagines. They spend countless hours at home planning curriculum and looking over children’s work. If a child or their classroom is without something, they buy it. The good ones possess an intangible quality that cannot be measured by student test results or complicated evaluation systems.
The good ones know how to toss out that carefully planned lesson that bombs with the children and to follow the children’s interests and ideas. The good ones know when a child who acts out needs a hug rather than a punishment, and when to send a positive message to the parents of a child with special needs who usually hear only about the problems. A teacher can look good on paper and produce good test scores, but still may lack the empathy, patience and love of children that are at the heart of good teaching.
According to C. Kirabo Jackson, the best and worst teachers’ VAM scores correlate highly with teacher ratings by principals and student or parent survey ratings. In a perfect world with unlimited time and resources, evaluating teachers would be a much richer and more meaningful process, one that more thoroughly trusted our human instincts about what makes for good teaching. In the meantime, we have to find a better way to enable teachers to reach their full potential, to recognize the good work they do, and to weed out those who are not suited to the work.