A proposal for identifying multivariate outliers
DOI:
https://doi.org/10.5902/2179460X29535Keywords:
Power inverse Lindley distribution, Methods of estimation, Likelihood, Monte Carlo simulationAbstract
The identification of outliers plays an important role in the statistical analysis, since such observations may contain important information regarding the hypotheses of the study. If classical statistical models are blindly applied to data containing atypical values, the results may be misleading and mistaken decisions can be made. Moreover, in practical situations, the outliers themselves are often the special points of interest and their identification may be the main objective of the investigation. In this way, it was proposed to propose a technique of detection of multivariate outliers, based on cluster analysis and to compare this technique with the method of identification of outliers via Mahalanobis Distance. For data generation, Monte Carlo method simulation and the mixed multivariate normal distribution technique were used. The results presented in the simulations showed that the proposed method was superior to the Mahalanobis method for both sensitivity and specificity, that is, it presented greater ability to correctly diagnose outliers and non-outliers individuals. In addition, the proposed methodology was illustrated with an application in real data from the health area.
Downloads
References
Aggarwal, C. C. (2013). An introduction to outlier analysis. Em: Outlier Analysis, Springer, pp. 1–40.
Bamnett, V., Lewis, T. (1994). Outliers in statistical data.
Berton, L., Huertas, J., Araújo, B., Zhao, L. (2010). Identifying abnormal nodes in complex networks by using random walk
measure. Em: IEEE Congress on Evolutionary Computation, IEEE, pp. 1–6.
Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 15.
Ferreira, D. F. (2011). Estatística multivariada, vol 1, 2o edn. UFLA.
Filzmoser, P. (2005). Identification of multivariate outliers: A performance study. Austrian Journal of Statistics, 34(2), 127–138.
Filzmoser, P., Maronna, R., Werner, M. (2008). Outlier identification in high dimensions. Computational Statistics & Data Analysis, 52(3), 1694–1711.
Filzmoser, P. A. (2004). A multivariate outlier detection method. Em: Proceedings of the Seventh International Conference on Computer Data Analysis and Modeling, vol 1, pp. 18–22.
Hawkins, D. M. (1980). Identification of outliers, vol 11. Chapman and Hall.
Jolliffe, I. (2002). Principal component analysis. Wiley Online Library.
Loureiro, A., Torgo, L., Soares, C. (2004). Outlier detection using clustering methods: a data cleaning application. Em: IN
PROCEEDINGS OF THE DATA MINING FOR BUSINESS WORKSHOP.
Nisha, R., Umamaheswari, N. (2014). Statistical based outlier detection in data aggregation for wireless sensor networks.
Journal of Theoretical and Applied Information Technology, 59(3), 770–780.
Oliveira, P. T. M. S., Santos, J. O., Munita, C. S. (2006). Identificação de valores discrepantes por meio da distância mahalanobis. Em: XVII Simpósio Nacional de Probabilidade e Estatística.
R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/.
Rousseeuw, P. J., Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212–223, URL http://www.tandfonline.com/doi/abs/10.1080/00401706.1999.10485670.
Rousseeuw, P. J., van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633–639, URL http://www.jstor.org/stable/2289995.
Sajesh, T., Srinivasan, M. (2013). An overview of multiple outliers in multidimensional data. Sri Lankan Journal of Applied Statistics, 14(2).
Santos-Pereira, C. M., Pires, A. M. (2013). Robust clustering method for the detection of outliers: using aic to select the number of clusters. Em: Advances in Regression, Survival Analysis, Extreme Values, Markov Processes and Other
Statistical Applications, Springer, pp. 409–415.
Valadares, F. G., de Aquino, A. L. L., Junior, A. R. P. (2012). Detecção de outliers multivariados em redes de sensores. Em: XLIV Simpósio Brasileiro de Pesquisa Operacional, SBPO.
Veloso, M. V. S., Cirillo, M. A. (2016). Principal components in the discrimination of outliers: A study in simulation sample data corrected by pearson’s and yates’s chi-square distance. Acta Scientiarum Technology, 38(2), 193–200.
Willems, G., Joe, H., Zamar, R. (2009). Diagnosing multivariate outliers detected by robust estimators. Journal of Computational and Graphical Statistics, 18(1), 73–91.
Downloads
Published
How to Cite
Issue
Section
License
To access the DECLARATION AND TRANSFER OF COPYRIGHT AUTHOR’S DECLARATION AND COPYRIGHT LICENSE click here.
Ethical Guidelines for Journal Publication
The Ciência e Natura journal is committed to ensuring ethics in publication and quality of articles.
Conformance to standards of ethical behavior is therefore expected of all parties involved: Authors, Editors, Reviewers, and the Publisher.
In particular,
Authors: Authors should present an objective discussion of the significance of research work as well as sufficient detail and references to permit others to replicate the experiments. Fraudulent or knowingly inaccurate statements constitute unethical behavior and are unacceptable. Review Articles should also be objective, comprehensive, and accurate accounts of the state of the art. The Authors should ensure that their work is entirely original works, and if the work and/or words of others have been used, this has been appropriately acknowledged. Plagiarism in all its forms constitutes unethical publishing behavior and is unacceptable. Submitting the same manuscript to more than one journal concurrently constitutes unethical publishing behavior and is unacceptable. Authors should not submit articles describing essentially the same research to more than one journal. The corresponding Author should ensure that there is a full consensus of all Co-authors in approving the final version of the paper and its submission for publication.
Editors: Editors should evaluate manuscripts exclusively on the basis of their academic merit. An Editor must not use unpublished information in the editor's own research without the express written consent of the Author. Editors should take reasonable responsive measures when ethical complaints have been presented concerning a submitted manuscript or published paper.
Reviewers: Any manuscripts received for review must be treated as confidential documents. Privileged information or ideas obtained through peer review must be kept confidential and not used for personal advantage. Reviewers should be conducted objectively, and observations should be formulated clearly with supporting arguments, so that Authors can use them for improving the paper. Any selected Reviewer who feels unqualified to review the research reported in a manuscript or knows that its prompt review will be impossible should notify the Editor and excuse himself from the review process. Reviewers should not consider manuscripts in which they have conflicts of interest resulting from competitive, collaborative, or other relationships or connections with any of the authors, companies, or institutions connected to the papers.