A performance evaluation in multivariate outliers identification methods
DOI:
https://doi.org/10.5902/2179460X41662Keywords:
Multivariate outliers, Simulation, Cluster analysis, Accuracy, Computational time.Abstract
Methodologies for identifying multivariate outliers are extremely important in statistical analysis. Outliers may reveal relevant information to variables under investigation. Statistical applications without prior identification of possible extreme values may yield controversial results and induce mistaken decision making. In many contexts, outliers are points of great practical interest. Given this, this paper seeks to discuss methodologies for the detection of multivariate outliers through a fair and adequate comparative technique in their simulation procedure. The comparison considers detection techniques based on Mahalanobis distance, besides a methodology based on cluster analysis technique. Sensitivity, specificity, and accuracy metrics are used to measure the method quality. An analysis of the computational time required to perform the procedures is evaluated. The technique based on cluster analysis revealed a noticeable superiority over the others in detection quality and also in execution time.Downloads
References
Aggarwal, C. C. (2017).An Introduction to Outlier Analysis, Springer International Publishing, pp. 1–34.
Atkinson, A. C., Riani, M. (2002). Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika, 89(4), 939–946.
Atkinson, A. C., Riani, M. (2004). The forward search and data visualisation. Computational Statistics,19(1), 29–54.
Atkinson, A. C., Riani, M., Cerioli, A. (2010). The forward search: Theory and data analysis. Journal of the korean statistical society,39(2), 117–134.
Barbosa, J. J., Pereira, T. M., Oliveira, F. L. P. (2018). Uma proposta para identificação de outliers multivariados.Ciência e Natura,40, 1–8.
Barnett, V., Lewis, T. (1994).Outliers in statistical data. John Wiley & Sons.
Berton, L., Huertas, J., Araújo, B., Zhao, L. (2010). Identifying abnormal nodes in complex networks by using random walkmeasure. Em: IEEE Congress on Evolutionary Computation, IEEE, pp. 1–6.
Filzmoser, P. (2005). Identification of multivariate outliers: a performance study. Austrian Journal of Statistics,34(2), 127–138.
Filzmoser, P., Garrett, R., Reimann, C. (2005). Multivariate outlier detection in exploration geochemistry. Computers & geosciences,31(5), 579–587.
Filzmoser, P., Maronna, R., Werner, M. (2008). Outlier identification in high dimensions. Computational Statistics & Data Analysis,52(3), 1694–1711.
Filzmoser, P., Hron, K., Reimann, C. (2009). Principal component analysis for compositional data with outliers. Environmetrics:The Official Journal of the International Environmetrics Society,20(6), 621–632.
Hawkins, D. M. (1980).Identification of outliers, vol 11. Chapman and Hall.
Jolliffe, I. (2011).Principal component analysis, Springer Berlin Heidelberg.
Kutsuna, T., Yamamoto, A. (2017). Outlier detection using binary decision diagrams.Data mining and knowledge discovery,31(2), 548–572.
Luo, J., Frisken, S., Machado, I., Zhang, M., Pieper, S., Golland, P., Toews, M., Unadkat, P., Sedghi, A., Zhou, H., et al. (2018).Using the variogram for vector outlier screening: application to feature-based image registration.International journal of computer assisted radiology and surgery,13(12), 1871–1880.
Martins, H. S. R., Duarte, A. R., Oliveira, F. L. P. (2020). Generating custom correlation matrices. Computational Statistics and Data Analysis (submitted paper), pp. 1–20.
Rousseeuw, P. J., Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics,41(3), 212–223.
Rousseeuw, P. J., Zomeren, B. C. V. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411), 633–639.
Valadares, F. G., Aquino, A. L. L., Rabelo, R. A. (2012). Detecção de outliers multivariados em redes de sensores sem fio. Em:XLIV Simpósio Brasileiro de Pesquisa Operacional, SBPO.
Van Zoest, V., Stein, A., Hoek, G. (2018). Outlier detection in urban air quality sensor networks.Water, Air, & Soil Pollution, 229(4), 111.
Veloso, M. V. S., Cirillo, M. A. (2016). Principal components in the discrimination of outliers: A study in simulation sample datacorrected by pearson’s and yates ́ s chisquare distance. Acta Scientiarum Technology,38(2), 193–200.
Zhu, J., Jiang, W., Liu, A., Liu, G., Zhao, L. (2017). Effective and efficient trajectory outlier detection based on time-dependent popular route.World Wide Web, 20(1), 111–134.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Ciência e Natura
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
To access the DECLARATION AND TRANSFER OF COPYRIGHT AUTHOR’S DECLARATION AND COPYRIGHT LICENSE click here.
Ethical Guidelines for Journal Publication
The Ciência e Natura journal is committed to ensuring ethics in publication and quality of articles.
Conformance to standards of ethical behavior is therefore expected of all parties involved: Authors, Editors, Reviewers, and the Publisher.
In particular,
Authors: Authors should present an objective discussion of the significance of research work as well as sufficient detail and references to permit others to replicate the experiments. Fraudulent or knowingly inaccurate statements constitute unethical behavior and are unacceptable. Review Articles should also be objective, comprehensive, and accurate accounts of the state of the art. The Authors should ensure that their work is entirely original works, and if the work and/or words of others have been used, this has been appropriately acknowledged. Plagiarism in all its forms constitutes unethical publishing behavior and is unacceptable. Submitting the same manuscript to more than one journal concurrently constitutes unethical publishing behavior and is unacceptable. Authors should not submit articles describing essentially the same research to more than one journal. The corresponding Author should ensure that there is a full consensus of all Co-authors in approving the final version of the paper and its submission for publication.
Editors: Editors should evaluate manuscripts exclusively on the basis of their academic merit. An Editor must not use unpublished information in the editor's own research without the express written consent of the Author. Editors should take reasonable responsive measures when ethical complaints have been presented concerning a submitted manuscript or published paper.
Reviewers: Any manuscripts received for review must be treated as confidential documents. Privileged information or ideas obtained through peer review must be kept confidential and not used for personal advantage. Reviewers should be conducted objectively, and observations should be formulated clearly with supporting arguments, so that Authors can use them for improving the paper. Any selected Reviewer who feels unqualified to review the research reported in a manuscript or knows that its prompt review will be impossible should notify the Editor and excuse himself from the review process. Reviewers should not consider manuscripts in which they have conflicts of interest resulting from competitive, collaborative, or other relationships or connections with any of the authors, companies, or institutions connected to the papers.