A performance evaluation in multivariate outliers identification methods

Authors

DOI:

https://doi.org/10.5902/2179460X41662

Keywords:

Multivariate outliers, Simulation, Cluster analysis, Accuracy, Computational time.

Abstract

Methodologies for identifying multivariate outliers are extremely important in statistical analysis. Outliers may reveal relevant information to variables under investigation. Statistical applications without prior identification of possible extreme values may yield controversial results and induce mistaken decision making. In many contexts, outliers are points of great practical interest. Given this, this paper seeks to discuss methodologies for the detection of multivariate outliers through a fair and adequate comparative technique in their simulation procedure. The comparison considers detection techniques based on Mahalanobis distance, besides a methodology based on cluster analysis technique. Sensitivity, specificity, and accuracy metrics are used to measure the method quality. An analysis of the computational time required to perform the procedures is evaluated. The technique based on cluster analysis revealed a noticeable superiority over the others in detection quality and also in execution time.

Downloads

Download data is not yet available.

Author Biographies

Josino José Barbosa, Universidade Federal de Viçosa (PPESTBIO) Universidade Federal de Ouro Preto

Josino José Barbosa possui graduação em Estatística pela Universidade Federal de Ouro Preto (2014) e mestrado em Estatística Aplicada e Biometria pela Universidade Federal de Viçosa (2017). Atualmente é doutorando do Programa de Pós-Graduação em Estatística Aplicada e Biometria da Universidade Federal de Viçosa.

Anderson Ribeiro Duarte, Universidade Federal de Ouro Preto

Anderson Ribeiro Duarte possui graduação em Matemática (licenciatura) pela Universidade Federal de Minas Gerais (2000), mestrado em Estatística pela Universidade Federal de Minas Gerais (2005) e doutorado em Estatística pela Universidade Federal de Minas Gerais (2009). Atualmente é professor associado na Universidade Federal de Ouro Preto. Tem experiência na área de Probabilidade e Estatística e Estatística Espacial, investiga problemas com ênfase em Estatística Geral, Teoria das Filas, Processos Estocásticos e Detecção de clusters espaciais.

Helgem Souza Ribeiro Martins, Universidade Federal de Viçosa (PPESTBIO) Universidade Federal de Ouro Preto

Helgem Souza Ribeiro Martins possui graduação em Estatística pela Universidade Federal de Ouro Preto (2014) e mestrado em Estatística na Universidade Federal de Minas Gerais (2016). Atualmente cursa doutorado em Estatística Aplicada e Biometria na Universidade Federal de Viçosa.

References

Aggarwal, C. C. (2017).An Introduction to Outlier Analysis, Springer International Publishing, pp. 1–34.

Atkinson, A. C., Riani, M. (2002). Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika, 89(4), 939–946.

Atkinson, A. C., Riani, M. (2004). The forward search and data visualisation. Computational Statistics,19(1), 29–54.

Atkinson, A. C., Riani, M., Cerioli, A. (2010). The forward search: Theory and data analysis. Journal of the korean statistical society,39(2), 117–134.

Barbosa, J. J., Pereira, T. M., Oliveira, F. L. P. (2018). Uma proposta para identificação de outliers multivariados.Ciência e Natura,40, 1–8.

Barnett, V., Lewis, T. (1994).Outliers in statistical data. John Wiley & Sons.

Berton, L., Huertas, J., Araújo, B., Zhao, L. (2010). Identifying abnormal nodes in complex networks by using random walkmeasure. Em: IEEE Congress on Evolutionary Computation, IEEE, pp. 1–6.

Filzmoser, P. (2005). Identification of multivariate outliers: a performance study. Austrian Journal of Statistics,34(2), 127–138.

Filzmoser, P., Garrett, R., Reimann, C. (2005). Multivariate outlier detection in exploration geochemistry. Computers & geosciences,31(5), 579–587.

Filzmoser, P., Maronna, R., Werner, M. (2008). Outlier identification in high dimensions. Computational Statistics & Data Analysis,52(3), 1694–1711.

Filzmoser, P., Hron, K., Reimann, C. (2009). Principal component analysis for compositional data with outliers. Environmetrics:The Official Journal of the International Environmetrics Society,20(6), 621–632.

Hawkins, D. M. (1980).Identification of outliers, vol 11. Chapman and Hall.

Jolliffe, I. (2011).Principal component analysis, Springer Berlin Heidelberg.

Kutsuna, T., Yamamoto, A. (2017). Outlier detection using binary decision diagrams.Data mining and knowledge discovery,31(2), 548–572.

Luo, J., Frisken, S., Machado, I., Zhang, M., Pieper, S., Golland, P., Toews, M., Unadkat, P., Sedghi, A., Zhou, H., et al. (2018).Using the variogram for vector outlier screening: application to feature-based image registration.International journal of computer assisted radiology and surgery,13(12), 1871–1880.

Martins, H. S. R., Duarte, A. R., Oliveira, F. L. P. (2020). Generating custom correlation matrices. Computational Statistics and Data Analysis (submitted paper), pp. 1–20.

Rousseeuw, P. J., Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics,41(3), 212–223.

Rousseeuw, P. J., Zomeren, B. C. V. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411), 633–639.

Valadares, F. G., Aquino, A. L. L., Rabelo, R. A. (2012). Detecção de outliers multivariados em redes de sensores sem fio. Em:XLIV Simpósio Brasileiro de Pesquisa Operacional, SBPO.

Van Zoest, V., Stein, A., Hoek, G. (2018). Outlier detection in urban air quality sensor networks.Water, Air, & Soil Pollution, 229(4), 111.

Veloso, M. V. S., Cirillo, M. A. (2016). Principal components in the discrimination of outliers: A study in simulation sample datacorrected by pearson’s and yates ́ s chisquare distance. Acta Scientiarum Technology,38(2), 193–200.

Zhu, J., Jiang, W., Liu, A., Liu, G., Zhao, L. (2017). Effective and efficient trajectory outlier detection based on time-dependent popular route.World Wide Web, 20(1), 111–134.

Downloads

Published

2020-05-15

How to Cite

Barbosa, J. J., Duarte, A. R., & Martins, H. S. R. (2020). A performance evaluation in multivariate outliers identification methods. Ciência E Natura, 42, e16. https://doi.org/10.5902/2179460X41662

Most read articles by the same author(s)