Links for Portland Parents of Talented and Gifted Children


Home Top Page 1


Summary and Comments on the studies produced by the Tennessee Value Added Assessment System (TVAAS)

Margaret DeLacy

December 11, 1999 (readings/links updated through 2010)


[Below is my summary and comments on a set of studies that were sent by the office of the TVAAS. I apologize for any mistakes or misrepresentations. I sent this summary to the TVAAS director, William Sanders, for corrections, and his comment is at the end. My own comments appear in italics within [square brackets].


Contents of this page:


TVAAS Articles in order of publication

What is the TVAAS and how did it begin?

What has the TVAAS found?


Postscript: Dr. Sanders' reply to my comments

Further reading

--Go back to Links for parents of Talented and Gifted children in Portland Public Schools--


TVAAS Articles in order of publication:


1994(a): William L. Sanders and Sandra P. Horn, "The Tennessee Value-Added Assessment System (TVAAS): Mixed-Model Methodology in Educational Assessment, Journal of Personnel Evaluation in Education, Vol. 8, 299-311

1994(b): William L. Sanders, Arnold M. Saxton, and others, "Effects of Building Change on Indicators of Student Academic Growth," Evaluation Perspectives, p.3.

1995: William L. Sanders and Sandra P. Horn, "Educational Assessment Reassessed: The Usefulness of Standardized and Alternative Measures of Student Achievement as Indicators for the Assessment of Educational Outcomes" Education Policy Analysis Archives Vol. 3, no. 6,

1996 (a): Samuel E. Bratton, Jr., Sandra P. Horn, S. Paul Wright, "Using and Interpreting Tennessee's Value-Added Assessment System: A Primer for Teachers and Principals" (TVAAS)

1996(b): William L. Sanders and June C. Rivers, "Cumulative and Residual Effects of Teachers on Future Student Academic Achievement" (TVAAS)

1997(a): [no author], "Graphical Summary of Educational Findings from the Tennessee Value-added Assessment System (TVAAS)

1997(b): S. Paul Wright, Sandra P. Horn and William L. Sanders, "Teacher and Classroom Context Effects on Student Achievement: Implications for Teacher Evaluation", Journal of Personnel Evaluation in Education, vol. 11, 57-67

1998: William L. Sanders and Sandra P. Horn, "Research Findings from the Tennessee Value-Added Assessment System (TVAAS) Database: Implications for Educational Evaluation and Research" Journal of Personnel Evaluation in Education, vol. 12, 247-256

1999: W. L. Sanders and K.J. Topping, "Teacher Effectiveness and Computer Assessment of Reading: Relating Value Added and Learning Information System Data." (TVAAS)

2004: W. L Sanders A summary of conclusions drawn from longitudinal analysis of student achievement data over the past 22 years.  Paper presented to Governors Education Symposium, Ashville, NC.

2009A Response to Criticisms of SASŪ EVAASŪ
William L. Sanders, S. Paul Wright, June C. Rivers, Jill G. Leandro
Nov. 13, 2009 (with additional references to articles online)



What is the Tennessee Value Added Assessment System (TVAAS) and how did it begin?


In 1992, following a suit filed by small school districts, the state of Tennessee passed an Education Improvement Act. This Act equalized funding across the state, and increased the total amount of school funding. The large increase in funding was paid for by increased sales taxes. Along with the funding came a demand for more "accountability" and a system was set up to measure student achievement, dropout rates, attendance, and promotion. The TVAAS was set up to measure the effectiveness of schools in increasing student achievement.(1998, p. 248)

Tennessee students in grades 3 through 8 take an achievement test, the Tennessee Comprehensive Assessment Program (TCAP) every year in five subjects: reading, language arts, math, science, and social studies. This test is written by McGraw/Hill which also provides tests to several other states. It was chosen because the test covers material that is taught in Tennessee schools. In addition, high school students are now being tested in five different areas of mathematics.(1998, p. 249)

The TCAP is a "norm-referenced" rather than a "criterion-referenced" test: that is, students are measured against the actual test scores of students in the same grades across the country, not against a fixed standard curriculum that every student is expected to master (1996(a), p. 25). The 1995 article argues that these tests are more reliable and much less expensive than alternative tests; they also take much less class time and cover a much broader range of topics than other assessment methods.(1995, pp.7-8 and 5-6). Costs for certain tests can be as high as $150 per pupil per test.(1995, p. 7). The cost for the TCAP in 1995 was $3.59 per student, and the cost of the TVAAS reports added $0.60 per student.(1996(a), p. 30). In Britain, when "standard assessment tasks," (individual performance tests) were introduced, it was estimated that the assessments took 2 to 5 weeks of class time.(1995, p.6)

TVAAS studies are based on a huge database that now contains more than five million records of student test results (1998, p. 250) and analysis requires a very powerful computer system.

The achievement scores of every student are saved over several years to form a continuous record, (a longitudinal record). Every student's record is also linked to the school district and school that that student attended, and to the individual student's teachers. Conclusions are based not only on each student's growth over the previous year, but also on averages of the student's growth over a three year period.(1998, p. 250)

The TVAAS system takes every student at his or her own starting level and measures how effectively the teacher/school/district increases what the student knows. This is the "value added" part of the system. Teachers and schools are held accountable for making sure that their students improve in scores from one test to the next, not for having their students meet some fixed standard minimum score(1998, p. 250)

TVAAS tracks how much time students spend with each teacher. Teachers are responsible for every student who spends 150 days in their classes, and reports of teachers are based on the average growth of their students over the past three to five years (1994(a), p. 303). Reports from the TVAAS must be included in teacher evaluations, but cannot be the only information used in teacher evaluation.(1998, p. 249.)

Every teacher receives a report on the achievement gains of the students in his/her class.

Teachers and schools receive information about the average achievement gains of their low, average, and high achieving students and can compare these to the growth of students of similar ability from earlier years in the same class/school. They can also compare their students' improvement with the national average. These reports help teachers and schools to pinpoint problems in particular grades or subjects, or with particular sorts of students.(1998, p. 250, 1997(a), pp 61-2)).


What has the TVAAS found?


The TVAAS has developed several statistical models that enable it to study the importance of various factors on student learning. Again, "student learning" is understood as the increase in achievement test scores from one year to the next. For example, they have looked at the effect of small classes compared to large classes, of changing from one school to another compared to staying in the same school, and of being in one school district compared to another.

Here are some of the most important findings and conclusions of the TVAAS studies. These apply only to the state of Tennessee, but it seems likely, because of the extremely large database, that they would also apply in other states.


Teachers are the most important factor in student success.


The studies found that the single most important factor in student achievement gain was the student's teacher. Two other important factors were the achievement level of the student, and the school system itself.

The 1996(b) study divided elementary school mathematics teachers (grades 3 to 5) in two different urban school systems into "quintiles," or five groups of teachers. Teachers were assigned to quintiles according to how much academic growth their students showed during the school year, from the lowest gain (first quintile) through low, average, above average, and highest gains. (fifth quintile). The study tracked students to see what sequences of "low," "average" or "high" teachers they had, and then compared the scores of the students.

"Differences in student achievement of 50 percentile points were observed as a result of teacher sequences after only three years. The effects of teachers on student achievement are both additive and cumulative with little evidence of compensatory effects." (1996(b) Summary of Findings).

A "low" teacher lowered the scores for a student for the year that the student was in that teacher's classroom, and even if the student had a "high" teacher the following year, the student did not catch up--a student who had a "high" teacher for both years or even an "average" teacher followed by a "high" teacher, would still have higher test scores. Good teachers could help their students make progress during the year they had them, but they couldn't completely erase the effect of lower growth the year before. The negative effect of poor teachers could still be seen two years later. The more "low" teachers a student had, the lower the students' final scores were likely to be.

"Average" teachers were able to do a good job with "average" students. However, the top fifth of students did not make the same progress UNLESS they had outstanding teachers.


High scoring students in mathematics need better teachers to show progress

than do other mathematics students.


"As teacher effectiveness increases, lower achieving students are the first to benefit.

The top quintile of teachers facilitate appropriate to excellent gains for students of all achievement levels."(1996(b) Summary of Findings).




Poor/Minority students make as much progress as other students with the same teachers


"African American students and white students with the same level of prior achievement make comparable academic progress when they are assigned to teachers of comparable effectiveness. " (1998, p. 254)

If two students, one black and one white, started a school year with similar achievement scores, they were likely to make the same growth during the year if they had equally able teachers. Similarly, if two students, one from a rich family/neighborhood and another from a poor family/neighborhood started with similar scores, they would also make the same growth during the year-- as long as they had equally able teachers.

However, if one of the students was starting with a higher test score, that student was likely to show less growth. Also, if one of the students had a less able teacher, that student was also less likely to succeed.

In one district where the black students made up 38% of the total, they were somewhat more likely to have the worst teachers. Ten percent more black students than expected were assigned to the least effective teachers. The authors cite another study by E.M. Bridges that found that when parents and students complained about poor teachers, the teachers were likely to be transferred to schools with large numbers of transient, poor, or minority students.(1997(b), p.5)

[My conclusion is that even in an average school, a math student in the top fifth of the class has a one in five chance of finding a teacher who is skilled enough to make it likely that that student will learn.

A very high-ability math student in a struggling school is under a double disadvantage. That student has a greater need than classmates for an excellent teacher, but is less likely to find one than a student in an average school.

If we assume that the pattern continues, then "TAG" students, who are in the top tenth may have even greater disadvantages than students in the top fifth. These students are very unlikely to succeed without intervention.--Margaret]


Schools in Poor/Minority neighborhoods are as effective as other schools

in fostering student achievement


"The effectiveness of a school cannot be predicted from a knowledge of the racial composition of the school population. ... Although sometimes schools with high proportions of minority students show lower average raw scale scores, the gains their students make are comparable to those of schools with a minimal proportion of minority students."(1997a, p 26)

When all subject areas were averaged over three year periods, schools with high levels of minority students showed the same overall gains in student performance as other schools .(1997(a), p. 26) Students in minority neighborhoods might START school with lower scores, but after that they showed the same amount of GROWTH in learning as students from other neighborhoods, even if the actual scores remained lower. This was also true for schools with many students in the free/reduced price lunch program (1997(a) p.32. The actual scores might be lower but the growth was the same.

Schools and teachers in poor neighborhoods often excuse poor scores by their students by pointing to the environment the students come from. This study shows that every school can fairly be held responsible for making sure that its students show a year's growth in their test results, even when the students start with low scores.

[My conclusion is that, if we focused our effort on early childhood/early primary so students didn't enter school with low scores, we could expect all schools to do well --Margaret].


The best students learn the least


Student achievement level was the second most important predictor of student learning. The higher the achievement level, the less growth a student was likely to have. "Only the most effective teachers--the top 20 per cent--are providing instruction that produces adequate gain in high-achieving students, while students in the lower achievement levels profit from all but the least effective teachers. Therefore, the majority of the brightest students fail to achieve to their potential year after year"(1998, p. 254). This happens in school systems in different parts of the state with different levels of poverty and of minority students.

"Possible explanations include lack of opportunity for high-scoring students to proceed at their own pace, lack of challenging materials, lack of accelerated course offerings, and concentration of instruction on the average or below-average student. This finding indicates that it cannot be assumed that higher-achieving students will "make it on their own."(1997(b), p. 66)

The actual school "system" [district] a student was in was the third most important factor.


Moving to middle schools reduces achievement


Transferring from one school to another didn't make much difference as long as a student transferred to any grade except the lowest. However, when whole groups of students moved to the bottom grade of a new school, for example, when students moved from elementary to middle school, there was a very serious loss in achievement. This drop was worst in grades 6 and 7--the usual grades for moving to middle school.(1994(b), p. 3).


Class size doesn't matter


Class size was not found to be a significant factor. [This one really bothered me until I thought about it, and now I wonder if it was a result of the way they did the study. The researchers divided classes into only two sizes--10 to 19 students and 20 to 32 students. Perhaps most classes are right around the dividing point. I can see that if nearly all Tennessee elementary classrooms have from 18 to 22 students and you divide those into two groups, you won't see much difference between them. I find it hard to believe that if you compared the bottom and top--classes with 10 students vs. classes with 32--you wouldn't get a significant difference in outcomes--Margaret].


Diverse classes are as successful as less diverse classes


Classes with students of a wide range of ability (heterogeneous) were as successful as classes with a smaller range (homogenous).

[at first sight, this finding seems to contradict many other studies of ability grouping--we now have more than 700 grouping studies. Most studies of gifted students have found very significant benefits for these students. Other studies have found that grouping does not harm the students who are not gifted. Robert Slavin's work, which is quoted in 1997(b) (p. 65), should be used with caution. Slavin did not include programs for gifted students in his study. Even Slavin now says that gifted students benefit from accelerated instruction or "a markedly different curriculum" (See his article, "Ability Grouping, Cooperative Learning and the Gifted," Journal for the Education of the Gifted, vol 14, (1990) p.4.)

[When we take another look at the TVAAS data, it is easy to understand how they came to the conclusion that grouping did not make a difference.

Most writers also stress the importance of teachers who are trained to teach gifted students. Since the study shows that gifted students have an even greater need of good teachers than average students, it suggests that if gifted students are grouped together and put in a classroom with a dreadful teacher they will do very badly--even worse than an average student. Even if they are grouped together and put in a classroom with an average teacher, they will not do very well. To succeed, gifted students need grouping AND an appropriate curriculum AND a good teacher! Otherwise they may not show even as much growth as average students with an average teacher. Therefore, it is not surprising that in school districts across the country, the best students are making the lowest gains.

For further discussion of the ability grouping issue, see James A. Kulik, An Analysis of the Research on Ability Grouping: Historical and Contemporary Perspectives, Storrs, Connecticut: National Research Center on the Gifted and Talented, 1992) and Tom Loveless, "The Tracking and Ability Grouping Debate," (1998) available online at




[The actual findings of the TVAAS should be approached with caution. A lack of information about how the Tennessee school system works, or about how particular findings were calculated can lead to unjustified assumptions about what the findings mean. However, it seems incontestable that the quality of teachers is the single most important factor in student success.

More important than the actual findings is the potential value of the METHOD. Using norm-referenced tests and measuring student growth seems to be a very effective tool for analyzing whether a school district is doing a good job with students of most abilities, and for finding places where improvements can be made. For very gifted students, however, norm-referenced tests usually used have too low a "ceiling" for satisfactory results, and the use of out-of-level testing would yield more useful information.

The "value added" approach seems to me to be more consistent with the Portland Public School District's "core objectives" than the use of "criterion-referenced standards" and single test scores required by the State.--Margaret]


Postscript: Dr Sanders' reply to my comments:

[I e-mailed the summary above to Dr. Sanders, and his reply took issue with my comments. His reply is printed below with his permssion.--Margaret DeLacy]

" Over-all I think you have done a good job with your summary. There are some of your conclusions with which I would quibble.

You have exaggerated the ceiling effect problem. This is one area that I have monitored very closely. The only place that we have found a problem is for measuring the teacher effects for teachers teaching "real" algebra in the 8th grade. The elementary tests do not adequately measure this progress. That is why we use our high school end-of-course tests for these classes.

Another point, often people incorrectly perceive that a test has a ceiling effect just because a rather large number of students have previously scored at the 97 percentile and higher. What is often failed to be considered is the error of measurement of these tests. The higher up on the distribution the more error of measurement resulting in the often mistaken view that there will be a ceiling bias the following year. If however, one considers each student's entire previous academic history (we use up to five years), then it can be demonstrated that progress for groups of the higher end students can be measured with out bias. This is not to say that all other tests would enable the above to be true. However, I do have data from several different achievement tests and have found consistency with the above.

I don't agree with your class size conclusion. To my knowledge about the only study that has found a class size effect was an earlier Tennessee study, Project Star (this study was done by other researchers at another institution.) In that study they lowered the class size below 15 to 1. In our study, we wanted to assess the relationship as it existed within the regions of the study. The decision to make this variable discrete was for statistical convenience so that we could express the potential interactions with other variable more conveniently. If we had included class size as a continuous variable the same conclusion would have been reached.

As to the issue of grouping or not grouping of students, I make the following argument. Each school should enable appropriate levels of academic growth for all students regardless of the entering level of the kid. How this is to be provided can take many different forms. The small rural school with one teacher per chronological age may have to have a different strategy than a suburban school with 15 teachers per grade. I don't think that we should get "hung up" on to group or not group, rather we should focus on sustained academic growth for all kids...."

"...I am especially concerned about the ceiling effect issue. I have heard this so much over the years and have had to demonstrate this usual non-problem that I would want to dispel this concern for parents of very high achieving kids. "

Bill Sanders

Since this article was written, a number of Sanders's articles have been made available online.  See  -- Margaret DeLacy


Further reading:




Ohio is one of ten states that have obtained approval from the U.S. Department of Education to employ a value-added assessment model for No Child Left Behind compliance. For several years, Ohio educators and businesses have been working with Batelle For Kids to implement this model. 


The Ohio Association for Gifted Children (OAGC) has a page on its website featuring links to a variety of value-added resources.


Michael Petrilli and Aaron Churchill of the Fordham Institute published a column "Why states should use student growth, and not proficiency rates, when gauging school effectiveness" using growth-based data from Ohio on October 13, 2016, on the Fordham website.

"A More Accurate Growth Model: Using Multigrade Adaptive Assessments to Measure Student Growth", a report from the Steering Committee of the Delaware Statewide Academic Growth Assessment Pilot.   Delaware recently requested permission to use multi-grade computer-adaptive tests in order to document the performance of both very high-performing and very low-performing students


Sean F. Reardon at the Center for Education Policy Analysis (CEPA) has carried out a "big data" analysis comparing achievement test results among third graders with subsequent growth and has shown that the two are not very closely correlated. (2017)

"Educational Opportunity in Early and Middle Childhood: Variation by Place and Age"


His results have been extensively covered by media: especially his findings that Rochester students had the lowest growth of US students in his study whereas Chicago students had the highest.


Sean Reardon discussing SEDA with Dan Schwartz and Denise Pope
December 05, 2017. Stanford Graduate School of Education
How Effective Is Your School District? A New Measure Shows Where Students Learn the Most
December 05, 2017. The New York Times
Stanford University study: Rochester schools last in U.S. in growth
December 05, 2017. Rochester Democrat and Chronicle
Students' early test scores do not predict academic growth over time, Stanford research finds
December 05, 2017. Stanford Report
CPS students are learning and growing faster than 96 percent of students in the United States
November 10, 2017.
New Analysis by Leading Education Expert: CPS Students Are Learning and Growing Faster Than 96% of Students in the United States
November 02, 2017. City of Chicago - Mayor Rahm Emanuel
CPS student scores show equivalent of 6 years of learning in 5 years
November 02, 2017. Chicago Sun Times






The District of Columbia is currently using value-added data in combination with teacher observations as a tool for evaluating teacher performance.  in 2010, Superintendent Michelle Rhee annouinced that 165 teachers would be fired for poor performance.  A discussion of the data that underlay the evaluations can be found in opposing articles that appeared in commentary from the Washington Post and Education Week in July of 2010.  Rick Hess added a follow-up in August, and his comment generated further responses from readers


Aaron Pallas: "Were some D.C. teachers fired based on flawed calculations?"


Rick Hess: "Professor Pallas's Inept, Irresponsible Attack on DCPS"


Rick Hess, "Value Added: the Devil's in the Details"







(alphabetically by author)

"Sanders 101" (1999) Jeff Archer, Education Week

"Unfinished Business: More Measured Approaches in Standards-Based Reform" by  Paul E. Barton of the Policy Information Center, Educational Testing Service (2004).  Another very informative summary of the issues raised by the use of achievement test scores to evaluate school performance.  Longer, more detailed and a bit denser than the NWEA report above.  Recommends a mixed approach to evaluation including ensuring that testing reflects actual curriculum goals, repeated testing during the school year,  and the use of both gains-based and status-based reporting.


'Failing' or 'Succeeding' Schools: How Can We Tell?" by Paul E. Barton published by the American Federation of Teachers (2006)


Value-Added and Experimental Studies of the Effect of Charter Schools on Student Achievement: A Literature Review, Julian Betts, Y. Emily Tang, (December 2008) from the Center on Reinventing Public Education at the University of Washington, Bothell.  Argues that traditional aggregate or "snapshot" score reports do not provide a good picture of the success of charter schools because of the variability of their student bodies.


For a recommendation that a value-added approach be adopted in California, see "Putting Education to the Test: A Value-Added Model for California" by Harold C. Doran and Lance T. Izumi (June 2004).  This article focuses on developing a score that will show whether a student is on  track to reach proficiency by the date required by the state standards and does not consider what happens with students who are proficient for many years in advance of those grade levels.


Measuring Teacher Effectiveness, Preliminary Findings, by the Gates Foundation, (no date, circa December, 2010, too many "authors" to list) Ably summarized in an article by Jason Felch of the Los Angeles Times (see below). See also the website for the MET project, listed below under "Research and Conferences"


NEW False Performance Gains: A Critique of Successive Cohort Indicators by Steven M. Glazerman and Liz Potamites December 201, Mathematica Policy Research, Working Paper

Argues that cohort-based accountability measures provide very misleading information when compared to average gain indicators and value-added measures.




NEW "Passing Muster: Evaluating Teacher Evaluation Systems by" Steven Glazerman, Mathematica Policy Research, Dan Goldhaber, University of Washington, Susanna Loeb, Stanford University and others, published by the Brookings Institutution, May 3, 2011

Advice on the best way to use value-added assessment to evaluate teacher performance,


NEW Evaluating Teachers: The Important Role of Value-Added Education, by Steven Glazerman, Mathematica Policy Research
Susanna Loeb, Stanford University, Dan Goldhaber, University of Washington, Douglas Staiger, Dartmouth University,Stephen Raudenbush, University of Chicago,  Grover J. "Russ" Whitehurst, for The Brookings Institution, The Brookings Brown Center Task Group on Teacher Quality, November 17, 2010

NEW "When the Stakes are High, can we Rely on Value-Added?  Exploring the Use of Value-Added Models to Inform Teacher WorkForce Decisions", Dan Goldhaber,  Center for American Progress,   December 2010

"Developing Value-Added Measures for Teachers and Schools" Eric A. Hanushek and Caroline M. Hoxby  a chapter from REFORMING EDUCATION IN ARKANSAS: Recommendations from the Koret Task Force (2005)  A short and readable outline of the main issues with recommendations.

Value-Added Measures of Education Performance: Clearing Away the Smoke and Mirrors Douglas N. Harris University of Wisconsin  (2010)a Policy Brief from the Pace School of Education at Stanford University.  Summary of a book that discusses the limitations and best uses of value-added measurement.

NEW "Inside IMPACT D.C.’s Model Teacher Evaluation System" by Susan Headden (Education Sector, 2011) IMPACT includes the use of value-added data on student assessment as well as expert evaluations of teachers' classroom performance. Many commentators are describing this as one of the best national accountability models.

"Value-Added Assessment and Systemic Reform: a Response to the Challenge of Human Capital Development" (2005) Theodore Hershberg, in the Phi Delta Kappan


Indispensable Tests: How a Value-Added Approach to School Testing Could Identify and Bolster Exceptional Teaching by Robert Holland (December 2001) Lexington Institute


"Individual Growth and School Success."  Northwest Education Associates. A very clear and entertaining explanation of why student gain data is important and should be used to correct the Annual Yearly Progress evaluation required by the No Child Left Behind Act.  Both the executive summary and the full report are available at the link below, but the full report requires free registration.


"Value Lessons" by Lynn Olson, Education Week May 5, 2004.  A long article about the implementation of Value-Added Assessment in Great Britain including both praise and criticism.


Is Your Child’s School Effective? (2006) Paul E. Peterson and Martin R. West  Education Next


NEW Incorporating Student Performance Measures into Teacher Evaluation Systems, by Jennifer Steele, Laura S. Hamilton, Brian M. Stecher Center for American Progress/ Rand Corporation, December 1, 2010


"Value-Added Assessment: An Accountability Revolution" by J..E. Stone in the compilation "Better Teachers, Better Schools" by the Fordham Foundation.


"No Child Left Behind Act: States Face Challenges Measuring Academic Growth That Education’s Initiatives May Help Address." United States Government Accountability Office, GAO Report to Congressional Requesters, July 2006


"Growth Measures: Don’t Call ’em ‘Value Added" February 2003 (recommended)




Newspaper/media articles:


Value-Added Evaluation & Those Pesky Collateralized Debt Obligations by Rick Hess for the Education Week blog, May 3, 2100

comment on the Bookings study "Passing Muster: Evaluating Teacher Evaluation systems"



Study backs 'value-added' analysis of teacher effectiveness By Jason Felch for the Los Angeles Times, Dec. 11, 2010
"Teachers' effectiveness can be reliably estimated by gauging their students' progress on standardized tests, according to the preliminary findings of a large-scale study released Friday by leading education researchers. The study, funded by the Bill and Melinda Gates Foundation, provides some of the strongest evidence to date of the validity of "value-added" analysis, whose accuracy has been hotly contested by teachers unions and some education experts who question the use of test scores to evaluate teachers.",0,5216463.story


The Los Angeles Times compiled Value-Added data for individual teachers and released the results on august 14th. 2010.  The results were surprising--there was more variation within schools than between them; student's economic and ethnic status did not play an important role, and neither did class size.  Strict teachers seemed to be more successful.   Below are two stories about the report. The final link is to an online discussion of these articles that was initiated by a blog by Kevin Karplus:


The Los Angeles Times released a follow-up report in May, 2011: "Times updates and expands value-added ratings for Los Angeles elementary school teachers: New data include ratings for about 11,500 teachers, nearly double the number covered last August. School and civic leaders had sought to halt release of the data".

 May 7, 2011, 11:29 p.m.,0,930050.story

FAQ about the analysis:

List of articles about the series:





New York Measuring Teachers by Test Scores by Jennifer Medina for the New York TImes, Jan 21, 2008

"New York City has embarked on an ambitious experiment, yet to be announced, in which some 2,500 teachers are being measured on how much their students improve on annual standardized tests. The move is so contentious that principals in some of the 140 schools participating have not told their teachers that they are being scrutinized based on student performance and improvement. ".....



"Tennessee seeks to use student tests to show teacher quality" From Education Week 05/07/03 an article about an initiative to use the TVAAS data to show that teachers are "high performing" if their students make good progress


"Tennessee Reconsiders Value-Added Assessment System" by Lynn Olson, Education Week, March 3, 2004:  The headline exaggerates a bit, what is being reconsidered is an adjustment that is made to the raw data before it is plugged into the Value-Added equations


Education Scholars Finding New 'Value' In Student Test Data, Education Week, 11/20/02 discusses some problems in collecting and analyzing value added data as well as benefits:


"Leaders in Ohio and Pennsylvania are making better sense of their school data"  December 2004

 The state of Ohio recently adopted a value-added assessment approach as part of its compliance with the No Child Left Behind Act and has received a $10m grant from the Battelle institute to create a database.  As a result, there is a continuing series of articles from Ohio charting the implementation of the system.A follow-up article discussing initiatives in Ohio and several other states and districts is a series of articles by Brett Schaeffer in The School Administrator (Web Edition),







For critiques of the traditional TVAAS model methodology and/or findings see:


Gates' Measures of Effective Teaching Study: More Value-Added Madness by

"If anything, this first MET report provides good evidence that simply asking students about their teachers is a much better idea than going through both statistical and logical gymnastics to obtain a VAM score."


Eva Baker et al:  Problems with the Use of Student Test Scores to Evaluate Teachers

Co-Authored by Scholars Convened by The Economic Policy Institute: Eva L. Baker, Paul E. Barton, Linda Darling-Hammond,

Edward Haertel, Helen F. Ladd , Robert L. Linn, Diane Ravitch, Richard Rothstein, Richard J. Shavelson, and Lorrie A. Shepard, August 29. 2020

This policy document argues that Value-Added Measurement systems hold benefits over other ways to evaluate test scores, but should not be used as a major component of evaluations of individual teachers


"Are we there yet? What Policymakers Can Learn From Tennessee's Growth Model" by Charles Barone (March 10, 2009) Education Sector Technical Reports

This is a critique of Sanders' proposal to use projections of student growth towards benchmarks for measuring whether schools are meeting the Annual Yearly Progress (AYP) standard of the Elementary and Secondary Education Act (ESEA), known as No Child Left Behind Act (NCLB).  Barone believes this model is too complex and lacks transparency. The article includes links to a rebuttal by Sanders and a rejoinder by Barone. They agree that NCLB doesn't address student learning above state state standards: "No one has as yet offered a clear accountability solution to address the criticism that any NCLB accountability model (i.e., one targeted only at "proficient") does not provide any incentives to raise performance once students have met the proficient benchmark. This is likely due to the fact the federal government has traditionally focused its efforts on the most at-risk students." (Barone, n. 6).


Damian Betebenner "An Analysis of School District Data Using Value-added Methodology" CSE Report 622, CRESST/University of Colorado at Boulder (March 2004) Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing  (note: this is a very technical report).  I haven't been able to figure out why Betebenner finds that teachers in a g/t program are at a relative advantage under traditional VA models (p.21) when many studies have found that these students usually make much lower achievement test gains. Further enlightenment would be appreciated.


Daneil Fallon, "Clarifying How We Think about Teaching and Learning" (2004) This is an editorial discussion presented at a conference, not a peer-reviewed article but provides a useful and readable account of the immediate history of TVAAS and relates it to the debate over "teacher effects" and teacher education.

    "Because the promise of value-added research is so great, and the competing arguments about it so confusing, Carnegie Corporation of New York awarded a grant to a team of outstanding statisticians at the RAND Corporation, an independent nonprofit national organization dedicated to research. We asked the research team to review all of the currently competing statistical value-added assessment models. We wanted RAND’s opinion on responsible uses of the method and we wanted their advice on what we can trust. The research team’s conclusions, published in a little book earlier this year, pointed to a number of reasons to be cautious when using value-added analysis.
    First and foremost, the researchers point out that our ability to draw sensible conclusions depends ultimately on the quality of the tests that are administered. Many of the tests in use today are in fact poorly aligned with state standards and also are not calibrated against the developmental level of the students at each grade. Some tests do not provide accurate or reliable measures of what we seek to discover.
Second, the researchers warn that all of the statistical models are relatively new and have not been broadly applied by large numbers of researchers. Although each model tries to control for many variables in order to get a clear reading of a teacher effect, we know that every model currently in use contains some amount of unplanned statistical bias, and these imperfections are not well understood.
The RAND study concludes that since value-added methods have known shortcomings they should not be used by themselves for high-stakes policy decisions."


Harcourt Assessment, Inc. "Value-added assessment systems" (2004) Another easy to read summary of the Rand report


Thomas J. Kane, Douglas O. Staiger, David Grissmer, Helen F. Ladd, "Volatility in School Test Scores: Implications for Test-Based Accountability Systems" Source: Brookings Papers on Education Policy, No. 5 (2002), pp. 235-283

Not an easy to read summary , but important reading none the less.  Unfortunately, it doesn't seem to be available in full text on the web (It can be found on JStor through the wonderful Multnomah County Library system).  The authors argue that test scores often vary for reasons that have nothing to do with teacher inputs and that variability increases as size falls.  As a result, very small schools, small classes, and populations that have been disaggregated by ethnicity or income into smaller groups have test results that are more variable than do bigger schools, classes, and groups.  This increased variability means that small schools have a much easier time floating to the top of the charts for one year than do large schools which are more likely to be closer to average.  (Small schools are also more likely to sink to the bottom temporarily). Programs that reward "top" schools are biased towards small schools.  Disaggregating results by student ethnicity has the perverse effect of rewarding segregated schools.  Using student gains in place of aggregate scores tends to increase volatility and averaging scores over a small number of years doesn't necessarily help.  The authors propose several ways to improve accountability including introducing some statistical "filters" into the accountability measures.


Haggai Kupermintz, "Teacher Effects as a Measure of Teacher Effectiveness Construct Validity Considerations in TVAAS (Tennessee Value-Added Assessment System)" (March 2004) CSE Technical Report 563 CRESST/University of Colorado at Boulder Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing

 Abstract: "This report examines the validity of measures of teacher effectiveness from the Tennessee Value-Added Assessment System (TVAAS). Specifically, the report considers the following claims regarding teacher effects: that they adequately capture teachers' unique contributions to student learning; that they reflect adequate standards of excellence for comparing teachers; that they provide useful diagnostic information to guide instructional practice; and that student test scores adequately capture desired outcomes of teaching. Our analyses of the TVAAS model highlight potential weaknesses and identify gaps in the current record of empirical evidence bearing on its validity"


Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz, and Laura S. Hamilton "Evaluating Value-Added
Models for Teacher Accountability  (2003)
An important and much-cited summary of the recent debates on TVAAS (called Value-Added Models or VAM) commissioned by the Rand Corporation.  "The research base is currently insufficient to support the use of VAM for high-stakes decisions. We have identified numerous possible sources of error in teacher effects and any attempt to use VAM estimates for high-stakes decisions must be informed by an understanding of these potential errors. However, it is not clear that VAM estimates would be more harmful than the alternative methods currently being used for test-based accountability.


Northwest Evaluation Association reports:

The Impact of the No Child Left Behind Act on Student Achievement and Growth.  This report found that after the implementation of NCLB, student achievement test scores rose slightly, but student growth decreased, and shows that high-achieving students made lower gains. 


Press Release

Executive Summary

Full report


Individual Growth and School Success.  This report explains the benefits of Gains-based assessment systems ("Value-Added Assessment") when compared with status-based methods and provides results for 22 states.

The short version is here:

and other reports from the same group are here:






Research and Conferences


CALDER publications: From the National Center for the Analysis of Longitudinal Data in Education Research, includes some value-added research


National Conference on Value-Added Modeling, convened by  CALDER, April 22-24, 2008

Contact information, papers, audio and power point slides.


Measures of Effective Teaching, A research project funded by the Gates Foundation


Value Added Methods, Implications for Policy and Practice, May 2008. Convened by the Urban Institute.  Audio and Powerpoint.  Focuses more on politics and policies, less on equations.


VARC--The Value Added Research Center at Wisconsin University


WCER Projects: Research Projects at the Wisconsin Center for Education Research



--Go back to Links for parents of Talented and Gifted children in Portland Public Schools--