“Teaching To the Test” Family of Fallacies

Richard P. Phelps


This article explains the various meanings and ambiguities of the phrase “teaching to the test” (TttT), describes its history and use as a pejorative, and outlines the policy implications of the popular, but fallacious, belief that “high stakes” testing induces TttT which, in turn, produces “test score inflation” or artificial test score gains. The history starts with the infamous “Lake Wobegon Effect” test score scandal in the US in the 1980s. John J. Cannell, a medical doctor, discovered that all US states administering national norm-referenced tests claimed their students’ average scores exceeded the national average, a mathematical impossibility. Cannell blamed educator cheating and lax security for the test score inflation, but education insiders managed to convince many that high stakes was the cause, despite the fact that Cannell’s tests had no stakes. Elevating the high stakes causes TttT, which causes test score inflation fallacy to dogma has served to divert attention from the endemic lax security with “internally administered” tests that should have encouraged policy makers to require more external controls in test administrations. The fallacy is partly responsible for promoting the ruinous practice of test preparation drilling on test format and administering practice tests as a substitute for genuine subject matter preparation. Finally, promoters of the fallacy have encouraged the practice of “auditing” allegedly untrustworthy high-stakes test score trends with score trends from allegedly trustworthy low-stakes tests, despite an abundance of evidence that low-stakes test scores are far less reliable, largely due to student disinterest.

Keywords: Test security, Educator cheating, Test score inflation, High stakes, Standardized tests, Education, CRESST, Daniel Koretz, John J. Cannell, Lake Wobegon Effect.


La Familia de Falacias "Enseñando para el Examen"

Este artículo explica los diversos significados y ambigüedades de la frase "enseñar  para el examen" (TttT: teaching to the test en inglés), describe su historia y su uso como un peyorativo, y describe las implicaciones políticas de la creencia popular, pero falaz, que las pruebas de a “gran escala” inducen TttT que, a su vez, produce una "inflación en la calificación obtenida en el examen" o ganancias em cuanto a los puntos obtenidos en la prueba. La historia comienza con el infame escándalo de la puntuación de la prueba "Lake Wobegon Effect" en los Estados Unidos en los años ochenta. John J. Cannell, un médico, descubrió que todos los estados de los Estados Unidos que administraban pruebas nacionales con referencias normativas afirmaban que los puntajes promedio de sus estudiantes excedían el promedio nacional, una imposibilidad matemática. Cannell atribuyó a los educadores el engaño y la seguridad laxa por la inflación de la puntuación de los exámenes, pero los expertos en educación lograron convencer a muchos de que las pruebas a gran escala eran la causa, a pesar de que las pruebas de Cannell no tenían ninguna fiabilidad. Exagerar las pruebas a gran escala hace que TttT hace que la falla de la inflación de la puntuación de la prueba al dogma haya servido para desviar la atención de la seguridad laxa endémica con pruebas "internamente administradas" que deberían haber alentado a los responsables políticos a exigir más controles externos en las administraciones de las pruebas. La falacia es en parte responsable de promover la práctica ruinosa en la preparación de las pruebas en el formato de prueba y la administración de pruebas prácticas como un sustituto de la preparación de la materia original. Por último, los promotores de la falacia han fomentado la práctica de "auditar" tendencias de determinadas puntuación en las pruebas a gran escala con las tendencias de puntuación presuntamente confiables de las pruebas de baja exigencia, a pesar de la abundancia de pruebas donde las puntuaciones de las pruebas a menor escala son mucho menos confiables debido al desinterés de los estudiantes.   

Palabras clave: Prueba de seguridad, Engaño de educador, inflación de la puntuación del examen, Pruebas a gran escala, Pruebas estandarizadas, Educación, CRESST, Daniel Koretz, John J. Cannell, Efecto Lake Wobegon.

Texto completo:

PDF (English)


Abdelfattah, F. (2010). The relationship between motivation and achievement in low-stakes examinations. Social Behavior and Personality, 38, 159-168.

Allalouf A., & Ben-Shakhar, G. (1998). The effect of coaching on the predictive validity of scholastic aptitude tests. Journal of Educational Measurement, 35(1), 31-47.

Allensworth, E., Correa, M., & Ponisciak, S. (2008). From high school to the future: ACT preparation–Too much, too late: Why ACT scores are low in Chicago and what it means for schools. Chicago, IL: Consortium on Chicago School Research at the University of Chicago.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington D. C.: AERA.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2013). Standards for educational and psychological testing. Washington D. C.: AERA.

Arendasy, M. E., Sommer, M., Gutierrez-Lobos, K., & Punter, J. F. (2016). Do individual differences in test preparation compromise the measurement fairness of admission tests? Intelligence, 55, 44-56.

Baker, E. L. (2000). Understanding educational quality: Where validity meets technology. Princeton, NJ: Educational Testing Service, Policy Information Center.

Barry, C. L., Horst, S. J., Finney, S. J., Brown, A. R., & Kopp, J. P. (2010). Do examinees have similar test-taking effort? A high-stakes question for low-stakes testing. International Journal of Testing, 10(4), 342-363. doi:10.1080/15305058.2010.508569

Becker, B. J. (1990). Coaching for the Scholastic Aptitude Test: Further synthesis and appraisal. Review of Educational Research, 60(3), 373-417.

Briggs, D. C. (2001). The effect of admissions test preparation. Chance, 14(1),10-18.

Briggs, D., & Hansen, B. (2004). Evaluating SAT test preparation: Gains, effects, and self-selection. Princeton, NJ: Educational Testing Service.

Brown, S. M., & Walberg, H. J. (1993). Motivational effects on test scores of elementary students. Journal of Educational Research, 86(3), 133-136.

Buckendahl, C. W., & Hunt, R. (2005). Whose rules? The relation between the “rules” and “law” of testing. In R. P. Phelps (Ed.), Defending standardized testing (pp. 147-158). Mahwah, NJ: Psychology Press.

Camara, W. (1999). Is commercial coaching for the SAT I worth the money?. New York, NY: College Counseling Connections.

Camara, W. J. (2008). College admission testing: Myths and realities in an age of admissions hype. In R. P. Phelps (Ed.), Correcting fallacies about educational and psychological testing (pp. 45-76). Washington D. C.: American Psychological Association.

Cannell, J. J. (1987). Nationally normed elementary achievement testing in America’s public schools. How all fifty states are above the national average. Daniels, WV: Friends for Education.

Cannell, J. J. (1989). How public educators cheat on standardized achievement tests. Albuquerque, NM: Friends for Education.

Crocker, L. (2005). Teaching for the test: How and why test preparation is appropriate. In R. P. Phelps (Ed.), Defending standardized testing (pp. 159-174). Mahwah, NJ: Psychology Press.

DerSimonian, R., & Laird, M. (1983). Evaluating the effect of coaching on SAT scores: A meta-analysis. Harvard Educational Review, 53, 1-5.

Eklof, H. (2007). Test-taking motivation and mathematics performance in TIMSS 2003. International Journal of Testing, 7, 311-326. doi: 10.1080/15305050701438074

Fraker, G. A. (1987). The Princeton Review reviewed. The Newsletter. Deerfield, MA: Deerfield Academy.

Gardner, W. (2008). Good teachers teach to the test: That's because it's eminently sound pedagogy. Retrieved from http://www.csmonitor.com/Commentary/Opinion/2008/0417/p09s02-coop.html

Koretz, D. (April, 1992). NAEP and the movement toward national testing. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.

Koretz, D. M. (1996). Improving America’s schools: The role of incentives. Washington D. C.: National Academy Press.

Koretz, D. M. (2008). Measuring up: What educational testing really tells us. Cambridge, MA: Harvard University Press.

Koretz, D. M., Linn, R. L., Dunbar, S. B., & Shepard, L. A. (April, 1991). The effects of high-stakes testing on achievement: Preliminary findings about generalization across tests. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.

Kulik, J. A., Bangert-Drowns, R. L., & Kulik, C-L. C. (1984). Effectiveness of coaching for aptitude tests. Psychological Bulletin, 95, 179-188.

Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4-16.

Linn, R. L., Graue, M. E., & Sanders, N. M. (1990). Comparing state and district results to national norms: The validity of the claims that everyone is above average. Educational Measurement: Issues and Practice, 9(3), 5-14.

Liu, O. L., Rios, J. A., & Borden, V. (2015). The effects of motivational instruction on college students' performance on low-stakes assessment. Educational Assessment, 20(2), 79-94. doi: 10.1080/10627197.2015.1028618

Marte, J. (2011). 10 things test-prep services won’t tell you. Market watch. Retrieved from http://www.marketwatch.com/story/10-things-testprep-services-wont-tell-you-1301943701454

Mathers, C., Finney, S., & Myers, A. (2016, July). How test instructions impact motivation and anxiety in low-stakes settings. Paper presented at the Annual Meeting of the Psychometric Society, Asheville, NC.

Messick, S., & Jungeblut, A. (1981). Time and method in coaching for the SAT. Psychological Bulletin, 89, 191-216.

Moore, W. P. (1991). Relationships among teacher test performance pressures, perceived testing benefits, test preparation strategies, and student test performance (PhD dissertation, University of Kansas, Lawrence).

Oliphant, R. (2011). Modern metrology and the revision of our Standards for Educational and Psychological Testing: An open letter to American parents. Nonpartisan Education Review / Essays, 7(4). Retrieved from http://www.nonpartisaneducation.org/Review/Essays/v7n4.pdf

Palmer, J. S. (2002). Performance incentives, teachers, and students: Estimating the effects of rewards policies on classroom practices and student performance (PhD dissertation, Ohio State University, Columbus, Ohio).

Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55-90). Mahwah, NJ: Psychology Press.

Phelps, R. P. (2006). A tribute to John J. Cannell, M.D. Nonpartisan Education Review/Essays, 2(4). Retrieved from http://www.nonpartisaneducation.org/Review/Essays/v2n4.pdf

Phelps, R. P. (2008/2009a). The rocky score-line of Lake Wobegon. In R. P. Phelps (Ed.), Correcting fallacies about educational and psychological testing (pp.102-134).

Washington D. C.: American Psychological Association.

Phelps, R. P. (2008/2009b). Educational achievement testing: Critiques and rebuttals. In R. P. Phelps (Ed.), Correcting fallacies about educational and psychological testing (pp. 66-90). Washington D. C.: American Psychological Association.

Phelps, R. P. (2010). The source of Lake Wobegon. Nonpartisan Education Review/Articles, 6(3). Retrieved from http://nonpartisaneducation.org/Review/Articles/v6n3.htm

Phelps, R. P. (2011a). Standards for Educational & Psychological Testing. New Orleans, LA: American Psychological Association.

Phelps, R. P. (2011b). Educator cheating is nothing new; doing something about it would be. Nonpartisan Education Review/Essays, 7(5). Retrieved from http://nonpartisaneducation.org/Review/Essays/v7n5.htm

Phelps, R. P. (2011c). Teach to the test? The Wilson Quarterly. Retrieved from http://wilsonquarterly.com/quarterly/fall-2013-americas-schools-4-big-questions/teach-to-the-test/

Phelps, R. P. (2012a). Dismissive reviews: Academe’s Memory Hole. Academic Questions, 25(2), 228-241.

Phelps, R. P. (2012b). The rot festers: Another National Research Council report on testing. New Educational Foundations, 1(1). Retrieved from http://www.newfoundations.com/NEFpubs/NewEduFdnsv1n1Announce.html

Phelps, R. P. (2014). Synergies for better learning: An international perspective on evaluation and assessment. Assessment in Education: Principles, Policies, & Practices, 21(4), 481-493. doi:10.1080/0969594X.2014.921091

Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappan, 68, 675-682.

Popham, W. J. (2004). All about accountability / “Teaching to the test”: An expression to eliminate. Educational Leadership, 62(3), 82-83.

Powers, D. E. (1993). Coaching for the SAT: A summary of the summaries and an update. Educational Measurement: Issues and Practice, 39, 24-30.

Powers, D. E., & Rock, D. A. (1999). Effects of coaching on SAT I: Reasoning test scores. Journal of Educational Measurement, 36(2), 93-118.

Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2016). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 16, 1-36. doi:10.1080/15305058.2016.1231193

Robb, T. N., & Ercanbrack, J. (1999). A study of the effect of direct test preparation on the TOEIC scores of Japanese university students. Teaching English as a Second or Foreign Language, 3(4).

Sessoms, J., & Finney, S. J. (2015) Measuring and modeling change in examinee effort on low-stakes tests across testing occasions. International Journal of Testing, 15(4), 356-388. doi:10.1080/15305058.2015.1034866

Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Educational Measurement: Issues and Practice, 9(3), 15-22.

Shepard, L. A. (April, 2000). The role of assessment in a learning culture. Presidential Address presented at the annual meeting of the American Educational Research Association, New Orleans.

Smith, J. K., Given, L. M., Julien, H., Ouellette, D., & DeLong, K. (2013). Information literacy proficiency: Assessing the gap in high school students’ readiness for undergraduate academic work. Library & Information Science Research, 35, 88-96.

Smyth, F. L. (1990). SAT coaching: What really happens to scores and how we are led to expect more. The Journal of College Admissions, 129, 7-16.

Snedecor, P. J. (1989). Coaching: Does it pay-revisited. The Journal of College Admissions, 125, 15-18.

Staradamskis, P. (2008, Fall). Measuring up: What educational testing really tells us. Book review. Educational Horizons, 87(1). Retrieved from http://nonpartisaneducation.org/Foundation/KoretzReview.htm

Steedle, J. T. (2014). Motivation filtering on a multi-institution assessment of general college outcomes. Applied Measurement in Education, 27, 58-76. doi: 10.1080/08957347.2013.853072

Tuckman, B. W. (April, 1994). Comparing incentive motivation to metacognitive strategy in its effect on achievement. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans.

Tuckman, B. W., & Trimble, S. (August, 1997). Using tests as a performance incentive to motivate eighth-graders to study. Paper presented at the annual meeting of the American Psychological Association, Chicago.

Wainer, H. (2011). Uneducated guesses: Using evidence to uncover misguided education policies. Princeton, NJ: Princeton University Press.

Whitla, D. K. (1988). Coaching: Does it pay? Not for Harvard students. The College Board Review, 148, 32-35.

Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1-17. doi: 10.1207/s15326977ea1001_1

Wise, S. L., & DeMars, C. E. (2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15, 27-41. doi: 10.1080/10627191003673216

Zehr, M. A. (2001). Study: Test-preparation courses raise scores only slightly. New York, NY: Education Week.

Zilberberg, A., Anderson, R. D., Finney, S. J., & Marsh, K. R. (2013). American college students’ attitudes toward institutional accountability testing: Developing measures. Educational Assessment, 18, 208-234. doi: 10.1080/10627197.2013.817153

DOI: http://dx.doi.org/10.15366/riee2017.10.1.002

Enlaces refback

  • No hay ningún enlace refback.

Revista Iberoamericana de Evaluación Educativa

ISSN: 1989-0397

doi: 10.15366/riee