Wayne State University

DETROIT — A Wayne State University researcher is working on a national initiative aimed at improving ways to measure the quality of teachers.

Ben Kelcey, Ph.D., assistant professor of education, recently received a one-year, $25,000 subcontract from the University of Michigan who is working in collaboration with Educational Testing Services and the Rand Corporation on a project funded by the Bill and Melinda Gates Foundation. The project will develop statistical models that will help researchers understand the bias and precision of several instruments measuring teacher quality through classroom observation.

Current quality measurement tools include direct observations of classroom teaching; analysis of classroom assignments and student work; paper-and-pencil measures of teachers’ pedagogical and content knowledge for teaching in mathematics and language arts; and measures based on standardized student achievement test scores. A national study, Understanding Teacher Quality (UTQ), is comparing all of those to better understand the potential of each measure to support the improvement of teaching and learning.

Kelcey’s current role in the study involves developing a set of statistical models that use multidimensional item response theory to relate classroom observations to dimensions of teacher quality. Teachers typically are observed on several days per year, with each day broken into a number of different segments, such as lessons or time periods, and then hierarchically organized. Up to 30 scores per day and thousands per year can be recorded.

The process can be problematic for several reasons, Kelcey said. First, there are a number of disparities in how different instruments score teaching, and many have high margins of error that make it difficult to account for uncertainty caused by things like rater severity.

Preliminary evidence suggests that instruments and statistical models matter, Kelcey said, and that teachers’ scores are sensitive to both differences in instrument as well as differences in statistical model. Simpler statistical models tend to ignore the multidimensionality and uncertainties present in classroom observations (e.g., rater severity, observation of an atypical lesson), overestimating the precision with which teacher quality can be indexed.

“People are making decisions using artificially precise and biased quantities.”

Teacher observations are becoming more important, he said, as researchers and states increasingly are coming to view them as a way to augment standardized student test score results to measure teacher quality. Though many states already have adopted classroom observations for teacher evaluation, Michigan legislators now are just beginning to consider it.

But even states that have implemented the process have done so without reasonable understanding of the uncertainty and imprecision of protocols and measures, Kelcey said.

“These scores are just horribly unreliable,” he said. “We’re making decisions about teachers’ livelihoods and futures based on really poor guesses with a lot of uncertainty.”

While measuring dimensions of a piece of paper is fairly easy and observable, Kelcey said, measuring teacher knowledge or quality is not. Even with fixed and observed data points, he said, different instruments and models yield different answers.

His statistical model will help to make sense of observation data and create a reliable and valid gauge of how teachers are instructing their students. Kelcey’s team is looking at implications of the models used to score those observations, as well as at the precision and sensitivity of teacher scores resulting from a given instrument and statistical model.

“We want an approach that differentiates between high-quality teaching and simply successful teaching,” he said. “Value-added models that only rely on test scores primarily measure successful teaching and can be heavily influenced by teaching to the test. In contrast, we might view high-quality teaching as instruction that is planned for and responsive to students’ needs, and situated within a specific context.

“We want to develop an approach that gives reliable evaluations about the quality of teaching and do so in ways that are robust to changes in observation protocol and statistical model.”

The overall purpose of the UTQ project is to provide a solid foundation for developing robust teaching evaluation systems that attend to central characteristics of teaching practice, including teachers' knowledge of content and how to teach it, effective instructional practices and how to engage students in effective classroom assignments.

The UTQ project is a collaboration of Educational Testing Service, RAND Corp. and the Institute for Social Research at the University of Michigan.


Wayne State University is one of the nation’s pre-eminent public research universities in an urban setting. Through its multidisciplinary approach to research and education, and its ongoing collaboration with government, industry and other institutions, the university seeks to enhance economic growth and improve the quality of life in the city of Detroit, state of Michigan and throughout the world. For more information about research at Wayne State University, visit http://www.research.wayne.edu.