A performance evaluation in multivariate outliers identification methods

Josino José Barbosa, Anderson Ribeiro Duarte, Helgem Souza Ribeiro Martins

Abstract


Methodologies for identifying multivariate outliers are extremely important in statistical analysis. Outliers may reveal relevant information to variables under investigation. Statistical applications without prior identification of possible extreme values may yield controversial results and induce mistaken decision making. In many contexts, outliers are points of great practical interest. Given this, this paper seeks to discuss methodologies for the detection of multivariate outliers through a fair and adequate comparative technique in their simulation procedure. The comparison considers detection techniques based on Mahalanobis distance, besides a methodology based on cluster analysis technique. Sensitivity, specificity, and accuracy metrics are used to measure the method quality. An analysis of the computational time required to perform the procedures is evaluated. The technique based on cluster analysis revealed a noticeable superiority over the others in detection quality and also in execution time.

Keywords


Multivariate outliers, Simulation, Cluster analysis, Accuracy, Computational time.

Full Text:

PDF

References


Aggarwal, C. C. (2017).An Introduction to Outlier Analysis, Springer International Publishing, pp. 1–34.

Atkinson, A. C., Riani, M. (2002). Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika, 89(4), 939–946.

Atkinson, A. C., Riani, M. (2004). The forward search and data visualisation. Computational Statistics,19(1), 29–54.

Atkinson, A. C., Riani, M., Cerioli, A. (2010). The forward search: Theory and data analysis. Journal of the korean statistical society,39(2), 117–134.

Barbosa, J. J., Pereira, T. M., Oliveira, F. L. P. (2018). Uma proposta para identificação de outliers multivariados.Ciência e Natura,40, 1–8.

Barnett, V., Lewis, T. (1994).Outliers in statistical data. John Wiley & Sons.

Berton, L., Huertas, J., Araújo, B., Zhao, L. (2010). Identifying abnormal nodes in complex networks by using random walkmeasure. Em: IEEE Congress on Evolutionary Computation, IEEE, pp. 1–6.

Filzmoser, P. (2005). Identification of multivariate outliers: a performance study. Austrian Journal of Statistics,34(2), 127–138.

Filzmoser, P., Garrett, R., Reimann, C. (2005). Multivariate outlier detection in exploration geochemistry. Computers & geosciences,31(5), 579–587.

Filzmoser, P., Maronna, R., Werner, M. (2008). Outlier identification in high dimensions. Computational Statistics & Data Analysis,52(3), 1694–1711.

Filzmoser, P., Hron, K., Reimann, C. (2009). Principal component analysis for compositional data with outliers. Environmetrics:The Official Journal of the International Environmetrics Society,20(6), 621–632.

Hawkins, D. M. (1980).Identification of outliers, vol 11. Chapman and Hall.

Jolliffe, I. (2011).Principal component analysis, Springer Berlin Heidelberg.

Kutsuna, T., Yamamoto, A. (2017). Outlier detection using binary decision diagrams.Data mining and knowledge discovery,31(2), 548–572.

Luo, J., Frisken, S., Machado, I., Zhang, M., Pieper, S., Golland, P., Toews, M., Unadkat, P., Sedghi, A., Zhou, H., et al. (2018).Using the variogram for vector outlier screening: application to feature-based image registration.International journal of computer assisted radiology and surgery,13(12), 1871–1880.

Martins, H. S. R., Duarte, A. R., Oliveira, F. L. P. (2020). Generating custom correlation matrices. Computational Statistics and Data Analysis (submitted paper), pp. 1–20.

Rousseeuw, P. J., Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics,41(3), 212–223.

Rousseeuw, P. J., Zomeren, B. C. V. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411), 633–639.

Valadares, F. G., Aquino, A. L. L., Rabelo, R. A. (2012). Detecção de outliers multivariados em redes de sensores sem fio. Em:XLIV Simpósio Brasileiro de Pesquisa Operacional, SBPO.

Van Zoest, V., Stein, A., Hoek, G. (2018). Outlier detection in urban air quality sensor networks.Water, Air, & Soil Pollution, 229(4), 111.

Veloso, M. V. S., Cirillo, M. A. (2016). Principal components in the discrimination of outliers: A study in simulation sample datacorrected by pearson’s and yates ́ s chisquare distance. Acta Scientiarum Technology,38(2), 193–200.

Zhu, J., Jiang, W., Liu, A., Liu, G., Zhao, L. (2017). Effective and efficient trajectory outlier detection based on time-dependent popular route.World Wide Web, 20(1), 111–134.




DOI: https://doi.org/10.5902/2179460X41662

Copyright (c) 2020 Ciência e Natura

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.