Educational inequalities in enem: a perspective based on socioeconomic variables and machine learning
DOI:
https://doi.org/10.5902/2318133893251Keywords:
Microdados do Enem, Random forest, Aprendizagem de máquinaAbstract
The National High School Exam serves as an important gateway to higher education in Brazil. This study examines the relationship between socioeconomic variables and student performance on the exam, employing machine learning techniques to identify significant patterns. The research has three main objectives to develop predictive models based on random forest to classify student performance to identify the most relevant socioeconomic variables and to analyze their impact on results, aiming to inform more equitable educational policies. The study used Enem 2023 microdata, which underwent preprocessing including one-hot encoding for certain variables and Smote for balancing. Ten random forest models were built, with hyperparameter tuning via random search. Performance was evaluated using metrics such as accuracy, precision, recall, and F1-score, along with variable importance analysis. The models demonstrated satisfactory performance, with accuracy around 94% and precision up to 99%. Parental education level, occupation, and family income emerged as key predictors. Students with more educated parents in strategic professions were three times more likely to achieve high performance, while those from low-income families showed greater tendency toward unsatisfactory results. The findings highlight the influence of socioeconomic factors on educational performance, underscoring the need for appropriate public policies. The models' effectiveness confirms their utility for educational diagnostics.
Downloads
References
BERGSTRA, James; BENGIO, Yoshua. Random search for hyper-parameter optimization. Journal of Machine Learning Research, Brookline, v. 13, n. 2, 2012, p. 281-305.
BREIMAN, Leo. Random forests. Machine Learning, Berlim, v. 45, 2001, p. 5-32. DOI: https://doi.org/10.1023/A:1010933404324
CHAWLA, Nitesh V; BOWYER, Kevin W; HALL, Lawrence O; KEGELMEYER, W. Philip. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, El Segundo, v. 16, 2002, p. 321-357. DOI: https://doi.org/10.1613/jair.953
MEC. Enem: Exame Nacional do Ensino Médio 2023. Disponível em: https://www.gov.br/inep/pt-br/areas-de-atuacao/avaliacao-e-exames-educacionais/enem. Acesso em: 18 set. 2023.
SEGER, Christian. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science: Stockholm, Sweden, 2018.
HE, Haibo; GARCIA, Edwardo. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, Los Alamitos, v. 21, n. 9, 2009, p. 1263-1284. DOI: https://doi.org/10.1109/TKDE.2008.239
KRAWCZYK, Bartosz. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, Heidelberg, v. 5, n. 4, 2016, p. 221-232. DOI: https://doi.org/10.1007/s13748-016-0094-0
FERNÁNDEZ, Alberto; GARCIA, Salvador; HERRERA, Francisco; CHAWLA, Nitesh. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, El Segundo, v. 61, 2018, p. 863-905. DOI: https://doi.org/10.1613/jair.1.11192
Downloads
Published
How to Cite
Issue
Section
License
Authors keep copyright and concede to the magazine the right of first publication, with the work simultaneously licensed under the Creative Commons Attribution 4.0 International, non-commercial license with no derivative work, which allows to share the work with no author recognition and initial publication in this magazine.
Authors has authorization to overtake additional contracts separately, to distribute a non-exclusive version of the work published in this magazine: For example: to publish in an institutional repository or as a chapter of a book, with authorial recognition and initial publication in this magazine.
Authors are allowed and are encouraged to publish and distribute their work online. For example: in institutional repositories or in their own personal page – at any point before or during the editorial process, because this can result in productive changes, as well as increase the impact and the mention to the published work.

