The Admission Exam to the National Autonomous University of Mexico: Validity Evidence of a Large Scale High-Stakes Test
Keywords:
Higher education admission, Summative assessment, Multiple choice test, Student selection, ValidityThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 Unported License.
Abstract
Introduction. Higher education institutions’ admission exams are summative high-stakes tests that have important consequences for applicants, so they require validity evidence to assure that appropriate inferences are made with the results. The National Autonomous University of Mexico (UNAM) is the most sought-after higher education institution in the country, annually less than 10% of applicants that take the test are admitted. Methods. Analysis of the sources of the test validity evidence was performed using Messick and Kane conceptual frameworks, as well as the AERA-APA-NCME Standards, with the information generated from the February 2019 admission test in 148.407 applicants. Results: Test validity evidence was identified from content, response process, internal structure, relationship with other variables and consequences. Results suggest that the test has enough validity evidence, to state that the instrument is robust as a technical tool for knowledge assessment and as a source of information for high-stakes decisions. Discussion. It is crucial that institutions that use these tools document their validity evidence, since they have great social relevance. It is necessary to perform periodic longitudinal studies about the test use and its implications, since social and educational conditions in the context of the applicant population are dynamic.
Downloads
References
Alavi, S. y Bordbar, S. (2017). Differential item functioning analysis of high-stakes test in terms of gender: A Rasch model approach. Malaysian Online Journal of Educational Sciences, 5(1), 10-24.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education and Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for educational and psychological testing. AERA.
Andrich, D. y Marais, I. (2019). A Course in Rasch Measurement Theory. En D. Andrich y I. Marais (Coords.), Measuring in the Educational, Social and Health Sciences (pp. 41-53). Springer. https://doi.org/10.1007/978-981-13-7496-8
Asociación Nacional de Universidades e Instituciones de Educación Superior ANUIES. (2019). Anuario estadístico de la población escolar en la educación superior. Técnico Superior y Licenciatura, ciclo 2017-2018. Recuperado de http://www.anuies.mx/informacion-y-servicios/informacion-estadistica-de-educacion-superior/anuario-estadistico-de-educacion-superior
Backhoff, E., Tirado, F. y Larrazolo, N. (2001). Ponderación diferencial de reactivos para mejorar la validez de una prueba de ingreso a la universidad. Revista Electrónica de Investigación Educativa, 3(1), 1-10.
Bennett, R. E. (2005). What does it mean to be a nonprofit educational measurement organization in the 21st century?. En R. Bennett y M. von Davier (Eds.), Advancing human assessment. The methodological, psychological and policy contributions of ETS (pp. 1-15). Springer. https://doi.org/10.1007/978-3-319-58689-2_1
Boone, W. y Noltemeyer, A. (2017). Rasch analysis: A primer for school psychology researchers and practitioners. Cogent Education, 4(14), 1-13. https://doi.org/10.1080/2331186X.2017.1416898
Buendía, M. A. y Rivera, R. (2010). Modelo de selección para el ingreso a la Educación Superior: El caso de la UACH. Revista de la Educación Superior, 39(156), 55-72.
Buntis, M., Buntis, K. y Eggert, F. (2017). Clarifying the concept of validity: From measurement to everyday language. Theory & Psychology, 27(5), 703-710. https://doi.org/10.1177/0959354317702256
Centro Nacional de Evaluación para la Educación Superior (CENEVAL). (2020). EXANI-II Admisión. CENEVAL.
Cizek, G. J. (2001). More unintended consequences of high?stakes testing. Educational Measurement: Issues and Practice, 20(4), 19-27. https://doi.org/10.1111/j.1745-3992.2001.tb00072.x
Cook, D. A., Bordage, G. y Schmidt, H. G. (2008). Description, justification and clarification: A framework for classifying the purposes of research in medical education. Medical Education, 42(2), 128-133. https://doi.org/10.1111/j.1365-2923.2007.02974.x
Dirección General de Administración Escolar (DGAE) UNAM. (2019). Demanda e ingreso a la licenciatura. Recuperado de http://www.estadistica.unam.mx/series_inst/index.php
Dirección General de Administración Escolar (DGAE) UNAM. (2020). Acerca de nosotros, quiénes somos y qué hacemos. DGAE, UNAM.CdMx. Recuperado de https://www.dgae.unam.mx/acerca_nosotros.html.
Dirección General de Planeación (DGPL) UNAM. (2020). Agenda Estadística 2020 UNAM. Recuperado de: http://www.estadistica.unam.mx/agenda.php.
Dorans, N. J. y Holland, P. W. (1992). DIF detection and description: Mantel?Haenszel and standardization. ETS Research Report Series, 1992, 1-40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
Frey, M. C. y Detterman, D. K. (2003). Scholastic assessment or G? The relationship between the Scholastic Assessment Test and general cognitive ability. Psychological Science, 15(6), 373-378. https://doi.org/10.1111/j.0956-7976.2004.00687.x
Gago, A. (2000). El CENEVAL y la evaluación externa de la educación en México. Revista Electrónica de Investigación Educativa, 2(2).
García-Medina, A. M., Martínez-Rizo, F. y Cordero Arroyo, G. (2016). Análisis del funcionamiento diferencial de los ítems del Excale de matemáticas para tercero de secundaria. Revista Mexicana de Investigación Educativa, 21(71), 1191-1220.
Graue, E. (2018). Acuerdo que reorganiza las funciones y estructura de la Secretaría General de la Universidad Nacional Autónoma de México. Gaceta UNAM.
Gregory, J. C. (2016). Validating test score meaning and defending test score use: different aims, different methods. Assessment in Education: Principles, Policy & Practice, 23(2), 212-225. https://doi.org/10.1080/0969594X.2015.1063479
Guzmán, C., y Serrano, O. (2011). Las puertas del ingreso a la educación superior: el caso del concurso de selección a la licenciatura de la UNAM. Revista de la Educación Superior, 40(157), 31-53.
Haladyna, T. M., Downing, S. M. y Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurment in Education, 15(3), 309-334. https://doi.org/10.1207/S15324818AME1503_5
Holland, P. y Weiner, H. (1993). Differential Item Functioning. Laurence Erlbaun Associates.
Juarros, M. (2006). ¿Educación superior como derecho o como privilegio?: Las políticas de admisión a la universidad en el contexto de los países de la región. Andamios, 3(5), 69-90. https://doi.org/10.29092/uacm.v3i5.342
Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198-211. https://doi.org/10.1080/0969594X.2015.1060192
Kane, M. y Bridgeman, B. (2017). Research on validity theory and practice at ETS. En R. Bennett y M. von Davier (Eds.), Advancing Human Assessment. Methodology of Educational Measurement and Assessment (pp. 489-551). Springer. https://doi.org/10.1007/978-3-319-58689-2_16
Lane, S., Raymond, M. R., Haladyna, T. M. y Downing, S. M. (2016). Test development process. En S. Lane, M. R. Raymond y T. M. Haladyna (Eds.), Handbook of test development (pp. 3-18). Routledge.
Linacre J. M. y Wright, B. D. (1989). Mantel-Haenszel DIF and PROX are equivalent! Rasch Measurement Transactions, 3(2), 52-53.
Manzi, J., Bosch, A., Bravo, D., del Pino, G., Donoso, G. y Pizarro, R. (2010). Validez diferencial y sesgo en la predictividad de las pruebas de admisión a las universidades chilenas. Revista Iberoamericana de Evaluación Educativa, 3(2), 30-48.
Martínez-González, A., Sánchez-Mendiola, M., Manzano-Patiño, A., García-Minjares, M., Herrera-Penilla, C. y Buzo-Casanova, E. (2018). Grado de conocimientos de los estudiantes al ingreso a la licenciatura y su asociación con el desempeño escolar y la eficiencia terminal. Modelo multivariado. Revista de la Educación Superior, 47(188), 57-85. https://doi.org/10.36857/resu.2018.188.508
Martínez-Rizo, F. (2001). Evaluación educativa y pruebas estandarizadas. Elementos para enriquecer el debate. Revista de la Educación Superior, 30(120), 71-85.
Martínez-Rizo, F. (2016). Impacto de las pruebas en gran escala en contextos de débil tradición técnica: Experiencia de México y el Grupo Iberoamericano de PISA. RELIEVE, 22(1), M0. http://dx.doi.org/10.7203/relieve.22.1.8244
Mendoza, A. (2015). La validez en los exámenes de alto impacto: Un enfoque desde la lógica argumentativa. Perfiles Educativos, 37(149), 169-186. https://doi.org/10.22201/iisue.24486167e.2015.149.53132
Mislevy, R. J. (2016), How developments in psychology and technology challenge validity argumentation. Journal of Educational Measurement, 53(3), 265-292. https://doi.org/10.1111/jedm.12117
OCDE (2018). “How do admission systems affect enrolment in public tertiary education?” Education Indicators in Focus. Recuperado de https://www.oecd-ilibrary.org/deliver/41bf120b-en.pdf?itemId=%2Fcontent%2Fpaper%2F41bf120b-en&mimeType=pdf https://doi.org/10.1787/41bf120b-en
Ordorika, I. Rodríguez, R. A. y Montes de Oca, M. M. (2013). Estudio Comparativo de Universidades Mexicanas. Fichas Institucionales 2007-2011. En DGEI-UNAM (Eds.), Cuadernos de Trabajo de la Dirección General de Evaluación Institucional (pp. 227-230). DGEI-UNAM.
Patterson, F., Roberts, C., Hanson, M. D., Hampe, W., Eva, K., Ponnamperuma, G., et al. (2018). 2018 Ottawa consensus statement: Selection and recruitment to the healthcare professions. Medical Teaching, 40(11), 1091-1101. https://doi.org/10.1080/0142159X.2018.1498589
Raykov, T. y Marcoulides, G. A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325–338. https://doi.org/10.1177/0013164415576958
Ringsted, C., Hodges, B. y Scherpbier, A. (2011). The research compass: An introduction to research in medical education: AMEE Guide nº 56. Medical Teaching, 33(9), 695-709. https://doi.org/10.3109/0142159X.2011.595436
Sánchez Mendiola, M., Delgado Maldonado, L., Flores Hernández, F., Leenen, I. y Martínez González, A. (2015). Evaluación del aprendizaje. En M. Sánchez Mendiola, A. Lifshitz Guinzberg, P. Vilar Puig, A. Martínez González, M. Varela Ruiz, M. y E. Graue Wiechers, (Eds.), Educación Médica: Teoría y Práctica (pp. 89-95). Elsevier.
Sánchez-Mendiola, M. y Delgado-Maldonado, L. (2017). Exámenes de alto impacto: Implicaciones educativas. Investigación Educativa Médica, 6(21), 52-62. https://doi.org/10.1016/j.riem.2016.12.001
Schuwirth, L., Colliver, J., Gruppen, L., Kreiter, C., Mennin, S., Onishi, H., et al. (2011). Research in assessment: Consensus statement and recommendations from the Ottawa 2010 Conference. Medical Teaching, 33(3), 224-233. https://doi.org/10.3109/0142159X.2011.551558
Shepard, L. (2016). Evaluating test validity: Reprise and progress. Assessment in Education: Principles, Policy & Practice, 23(2), 268-280. https://doi.org/10.1080/0969594X.2016.1141168
Sigal, V. y Dávila, M. (2004). La cuestión de la admisión a los estudios universitarios en Argentina. En O. Barsky, V. Sigal y M. Dávila (Eds.), Los desafíos de la universidad argentina (pp. 205-222). Siglo XXI Editores.
Sireci, S. G. (2016). On the validity of useless tests. Assessment in Education: Principles, Policy & Practice, 23(2), 226-235. https://doi.org/10.1080/0969594X.2015.1072084
Trost, G. (1993). Principios y prácticas en la selección para la admisión a la educación superior. Revista de la Educación Superior, 22(85), 1-10.
UNAM. (1997). Reglamento General de Inscripciones. Universidad Nacional Autónoma de México. Recuperado de https://www.dgae-siae.unam.mx/acerca/normatividad.html#leg-3.
Walker, C. (2011). What's the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29(4), 364-376. https://doi.org/10.1177/0734282911406666
Yavuz, S., Dogan, N., Hambleton, R. K. y Yurtcu, M. (2018). The comparison of differential item functioning predicted through experts and statistical techniques. Cypriot Journal of Educational Science, 13(2), 375-384. https://doi.org/10.18844/cjes.v13i2.2427
Young, M., St-Onge, C., Xiao, J., Vachon Lachiver, E. y Torabi, N. (2018). Characterizing the literature on validity and assessment in medical education: a bibliometric study. Perspectives on Medical Education, 7(3), 182-191. https://doi.org/10.1007/s40037-018-0433-x
Zieky, M. (1993). DIF statistics in test development. En P. W. Holland y H. Wainer (Eds), Differential item functioning (pp. 337–347). Erlbaum.
Zwick, R. (2006). Higher Education Admissions Testing, En R. Brennan (Ed.), Educational Measurement (pp. 647-679). National Council on Measurement in Education Greenwood Press.