Thursday, October 08, 2009

SAT and GRE testing

The workaround for this problem is to tailor the test to the examinee, real-time, with computer-adaptive testing. So let's say you get an item with a difficulty estimate of 1 correct; now the computer will hit you with one at 1.2, for example, and keep ramping up until you kind of level off at getting 50/50 right, which is where it decides you belong. Once it has you figured out, it either just throws easy ones at you so you feel good about yourself, or starts serving up items still undergoing pilot testing. Either way, what you do after that point will not affect your score.

This sounds great, and it would be great, if it worked reliably. The problem is that the thing has to kick in somewhere at the beginning of the test, and define a broad range that you belong in, and then a narrower range, and then a narrower range, etc. What this basically does is unfairly "weight" the first few items of the test, because they are the ones that will determine what large band of scores you will be eligible for. Once the machine has pegged you at the lower half, say, there is no way for you to break out of that, because it's never going to give you those harder questions. If that's not where you belong, you won't be able to demonstrate that, and you'll just get the top score of that band. So if you start the thing out and you're nervous and you just make a dumb mistake, that mistake can really cost you--much more than it would later in the test. All these models are probabilistic, so guessing and just making dumb mistakes are accounted for. But the moment you go adaptive, the beauty of the model is trashed at the beginning and doesn't come into effect until later.

Many of the tests which moved to computer-adaptive methods have gone back to just serving a range of items, but one, the GRE, is still adaptive, even though ETS (the company that makes it and the SAT and the TOEFL) knows it doesn't work reliably (people taking the test over and over can get very different scores). Evidently there are financial/political reasons they can't get rid of it (rumor).

