Alternatives to Standardized Testing

Part 1: In Defense of Standardized Testing
Part 2: Alternatives to Standardized Testing
Part 3: Standardized Tests: NAEP, PIRLS, TIMSS, PARCC, PISA, ITBS, and CLT

Standardized testing comes with a sordid history of intentional discrimination, perverse incentives, suspicious discrepancies in scores, and outright cheating. What are the alternatives?

In my research for this blog series, a 2015 article by NPR about alternatives to standardized testing was referenced repeatedly. There were four main alternatives.
1. Sampling
Summary: This is essentially the same as standardized testing, but instead of testing all students, it would test a statistically representative group of students. This is what the NAEP and PISA do.
My Thoughts: I am not completely against this approach. It could be a decent compromise. But I would want my child to be assessed each year. I think it is valuable to see where my child stands in relation to children in the school, district, state, and nationally. This isn’t an attempt to boast about the score, it gives valuable information to parents because the tests give a reference point that is beyond the classroom grades and that is comparable with other locations. Does the test score roughly match my child’s grades? This ERIC Digest provides an excellent summary of how to use/interpret the results of a standardized test.
2. Stealth Assessment
Summary: This is basically gamification. Assessing students with their performance on a computer program.
My Thoughts: Technology can be amazing. But I don’t think this would be a wise direction to move towards. I have not seen any data on the validity of stealth assessment (I don’t think there is much research here yet). It would also bring up even more equity issues than the current set of standardized tests.
3. Multiple Measures
Summary: Instead of measuring based on one assessment (the test) it could use social and emotional skills surveys, game-based assessments (stealth assessment) and performance or portfolio-based assessments.
My Thoughts: There is important data here that would help parents, teachers, administrators, and policy makers, and it would seem obvious to me that we should assess schools and teachers on multiple measures. But wouldn’t the same accusations of bias involved in standardized testing be there for the surveys as well? And, since they are about social and emotional skills/norms, wouldn’t that be even more controversial than standardized academic tests?
Portfolio assessments should not be considered as a replacement for standardized tests because, based on what they are, it is impossible to standardize them. They can be great tools at the teacher/school level though.
I’ll spend some space talking about performance assessments later. They are the most promising alternative.
4. Inspections
Summary: An inspector will come and assess a variety of factors in the school.
My Thoughts: Even with observations, we cannot reliably assess individual teachers because there are so many variables (Wiliam, Leadership for Teacher Learning, Ch 2). Evaluating an entire school or school system in this manner would be exponentially more difficult.
Using inspections would give us good data (we should have some sort of inspection data as part of a multiple measures approach), but it would be much more expensive than standardized testing due to the required man hours and would be a very different type of data. It would not tell us much about what students are or are not learning.

The Most Promising Alternative

The specific alternative to standardized tests I find most promising is a type of performance based assessment. Though there are very significant challenges that performance assessments will have to hurdle before I would be willing to consider replacing standardized tests with performance assessments. 

The performance assessment would have to be externally imposed on schools in a similar way standardized tests currently are. The assessment would also have to be standardized. The purpose here is two-fold. Standardization allows for comparisons between different groups of students and it helps control the bias.

If the assessment is not standardized and given in a standardized manner, then the data generated will not be very useful for anything broader than the context the assessment was given in. There would be too many variables. The performance assessment should also be externally imposed because these assessments should function as a type of audit on the system. Is it working? Are all students being educated?

The last hurdle may be the largest. There is a paucity of research on performance assessments, and alternatives to standardized tests in general (Garcia & Pearson, 1994). I was not able to find anything more recent. It could be that I just don’t know the right search terms. If you are aware of more recent research on possible replacements for standardized tests, please send it my way either in the comments below or on Twitter (@Teacher_Fulton). We should not replace standardized tests with performance assessments until they have developed a track record at least as reliable as standardized tests.

The next post in this series will give an overview of several common standardized tests. (coming soon)

William, D. (2016). Leadership for Teacher Learning: Creating a Culture Where All Teachers Improve So That All Students Succeed. Learning Sciences International.

In Defense of Standardized Testing

This series of articles is primarily concerned with standardized tests in compulsory education (Iowa Test of Basic Skills, PISA, TIMSS, PIRLS, NAEP). These tests differ from college entrance exams (ACT, SAT) in that, except for some state achievement tests, the tests tend to be low or no stakes for both the students and schools. 

Many educators have an aversion to standardized testing, and this is not without reason. Teachers spend an inordinate amount of time preparing their students for many of these tests and beyond that, these tests have led to a narrowing of the curriculum. This happens in the misguided attempt to focus on reading and math by reducing the time spent on science, social studies, art, etc (sometimes drastically!). This is misguided because, while it makes sense that you could increase these scores by spending more time on said subjects, doing so actually reduces background knowledge, which, after decoding, is the key to comprehension. 

But It Gets Worse

Standardized tests have been intentionally used by educators to exclude minorities. For one example, you can look into the case of Larry P, a black student in California who was wrongly sent into special education. You can also read this article from Time Magazine for an overview of the negatives.

Other times, the blind spots of the test writers caused them to discriminate against girls as Garcia and Pearson (1994) note,

“When girls outscored boys on the 1916 version of the test designers, apparently operating under the assumption that girls could not be more intelligent than boys, concluded that the test had serious faults. When they revised the 1937 version, they eliminated those items on which girls outperformed boys. By contrast, they did not revise or eliminate items that favored urban over rural children or children of professional fathers over children day laborers (Mercer, 1989); these cultural differences apparently matched developers’ expectations of how intelligence and achievement ought to be distributed across groups (Kamin, 1974; Karier, 1973a, 1973b; Mercer, 1989).”

Whether these blind spots are willful or simply ignorant is irrelevant for our purposes. What is important is that we acknowledge that this type of discriminatory bias is still a possibility in standardized tests today. 

Content Bias

This is the type of bias that is most often pointed out in standardized tests. Content bias is simply when the content of the test favors one particular culture over another, typically favoring the majority culture. This, by default, disadvantages minorities and so it is important to be able to counter content bias if we want standardized tests to be meaningful.

Thankfully, modern standardized test creators take bias seriously.

They “have used a variety of techniques to create unbiased tests (Cole & Moss, 1989; Linn, 1983; Oakland & Matuszek, 1977). Among others, they have examined item selection procedures, examiner characteristics, and language used on the tests as possible sources of bias. One of the most common methods used to control for test bias is that of examining the concurrent or predictive validity of individual tests for different groups through correlational or regression analysis.” (Garcia and Pearson, 1994).

For more detail on what this looks like in practice, read this EdSurge article. Managing content bias will always be a challenge, even with knowledge of history, advanced statistical tools, and a good heart.

Perverse Incentives

Many standardized tests also suffer from the Cambell effect. This simply means that when tests are important (high-stakes) for students or teachers, then it is more likely for the results to be corrupted by any number of means. 

Think about it, when teachers and schools are assessed based on their students’ performance, they will do what they can to look good. And when your job is on the line, you may be driven to take certain….“shortcuts”.

This often leads to the aforementioned narrowing of the curriculum, which disproportionately affects students in impoverished areas. 

On top of this, there are numerous cases of outright illegal behavior. Schools engaged in the practice of scrubbing, unenrolling students or encouraging a temporary truancy. There have also been cases of students being held back in grade 9 and then, after repeating said year, they jump up to grade 11, conveniently skipping the standardized tests (Koretz, The Testing Charade, Ch 5).

And then there are the cases of traditional cheating. The most famous of which is the disaster in Atlanta where 11 educators were given felony convictions and 22 other teachers reached plea agreements. 

We know that cheating is unfortunately not an isolated problem, it has been estimated that, on the low end, at least 5% of these high-stakes standardized tests involve cheating in some fashion (Jacob & Levitt, 2003).

Discrepancies in Test Scores

Poor students tend to score lower than wealthy students. Minority students tend to score lower than white students. This certainly should raise some red flags because it shows that there are real problems somewhere, though not necessarily with the test itself. Once we work to reduce the variables and compare students of different ethnicities who share a similar socioeconomic status and language level, the achievement gap is greatly decreased, but still significant (Garcia & Pearson, 1994), showing that there is at least one other, but likely multiple significant problems, somewhere.

The challenge here is two-fold. Is the primary problem with the standardized tests themselves or with unequal schools, differing home situations, etc? Both?

The Importance of Standardization

In America, 80% of teachers are white (NCES, 2019). Even if you choose to assume the best, it is foolish to assume that the average teacher is knowledgeable about every culture and can adequately adjust for content bias.

Standardization allows for a level of control over the bias because you only need to provide oversight to one group, not millions of teachers. In addition the makers of standardized tests are specifically trained to create them and to analyze them for bias. This doesn’t mean they are perfect, but they are certainly better at making tests and adjusting for bias than the average teacher.

The main value provided by standardized tests is that they give data. Without this data, we would not be aware of the discrepancies in performance based on race or income mentioned above.

Now, we tend to use the data in order to make excuses. “These disparities exist because of economic inequality, we really need to fix that.” And, true enough. But economic inequality is not relevant for teachers to do their job. Our job is to teach students as they are. We need to get results with the students we have in the schools we’re at. If you use a student’s social situation to excuse their lack of learning, get out of education. Social situations provide context, not excuses. 

The data shows where teachers and schools are failing to educate their students. The data shows where problems are. We should use this to help schools help children. We should use this data as a tool to help us identify successful teaching methods. If we get rid of standardized assessments, we also get rid of this data. To do so is to choose to make ourselves blind, not a wise choice.

The scope of the problem is huge. Are there valid alternatives to standardized testing? (coming soon)

America fails too many of her students, but it isn’t all doom and gloom, though there is a fair share of it. Just take a look at how her students perform (coming soon).

Part 1: In Defense of Standardized Testing
Part 2: Alternatives to Standardized Testing
Part 3: Standardized Tests: NAEP, PIRLS, TIMSS, PARCC, PISA, ITBS, and CLT

García, G. E., & Pearson, P. D. (1994). Chapter 8: Assessment and Diversity. Review of Research in Education, 20(1), 337–391.
Jacob, Brian A. and Steven D. Levitt. “Rotten Apples: An Investigation Of The Prevalence And Predictors Of Teacher Cheating,” Quarterly Journal of Economics, 2003, v118(3,Aug), 843-878.
Koretz, D. (2017). The Testing Charade: Pretending to Make Schools Better. University of Chicago Press.