The usefulness of robust multivariate methods: A case study with the menu items of a fast food restaurant chain
DOI:
https://doi.org/10.5902/2179460X39892Keywords:
Multivariate statistics, Data science, Robust principal component analysis, Robust cluster analysis, Data visualization, Multivariate outlier detectionAbstract
Multivariate statistical methods have been playing an important role in statistics and data analysis for a very long time. Nowadays, with the increase in the amounts of data collected every day in many disciplines, and with the raise of data science, machine learning and applied statistics, that role is even more important. Two of the most widely used multivariate statistical methods are cluster analysis and principal component analysis. These, similarly to many other models and algorithms, are adequate when the data satisfies certain assumptions. However, when the distribution of the data is not normal and/or it shows heavy tails and outlying observations, the classic models and algorithms might produce erroneous conclusions. Robust statistical methods such as algorithms for robust cluster analysis and for robust principal component analysis are of great usefulness when analyzing contaminated data with outlying observations. In this paper we consider a data set containing the products available in a fast food restaurant chain together with their respective nutritional information, and discuss the usefulness of robust statistical methods for classification, clustering and data visualization.
Downloads
References
CUESTA-ALBERTOS, J.A.; GORDALIZA, A.; MATRÁN, C. "Trimmed k-means: an attempt to robustify quantizers". Annals of Statistics. 1997;25:553-576.
COHEN FREUE, G.V.; HOLLANDER, Z.; SHEN, E.; ZAMAR, R.H.; BALSHAW, R.; SCHERER, A.; MCMANUS, B.; KEOWN, P.; MCMASTER, W.R.; NG, R.T. MDQC: A New Quality Assessment Method for Microarrays Based on Quality Control Reports. Bioinformatics. 2007;23:3162 – 3169.
CROUX, C.; RUIZ-GAZEN, A. High breakdown estimators for principal components: The projection-pursuit approach revisited. Journal of Multivariate Analysis. 2005;95:206–226.
CROUX, C.; FILZMOSER, P.; OLIVEIRA, M. Algorithms for Projection-Pursuit Robust Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems. 2007;87:218–225.
FILZMOSER, P.; GARRETT, R.G.; REIMANN, C. Multivariate outlier detection in exploration geochemistry. Computers and Geosciences. 2005;31:579-587.
FILZMOSER, P.; TODOROV, V. Robust tools for the imperfect world. Information Sciences. 2013;245:4–20.
GABRIEL, K.R. The biplot graphic display of matrices with application to principal component analysis. Biometrika. 1971;58:453–467
GARCÍA-ESCUDERO, L.A.; GORDALIZA, A. Robustness Properties of k-Means and Trimmed k-Means. Journal of the American Statistical Association. 1999;94:956–969
GARCÍA-ESCUDERO, L.A.; GORDALIZA, A.; MATRÁN, C.; MAYO-ISCAR, A. A Review of Robust Clustering Methods. Advances in Data Analysis and Classification. 2010;4:89–109.
GOWER, J.C. A general coefficient of similarity and some of its properties. Biometrics. 1971;27:857–874.
HAWKINS, D.M.; LIU, L.; YOUNG, S. Robust Singular Value Decomposition, National Institute of Statistical Sciences. Technical Report Number. 2001;122.
HUBER P.J.; RONCHETTI E.M. Robust Statistics. 2nd ed. USA: Wiley; 2009.
HUBERT, M.; ROUSSEEUW, P.J.; BRANDEN, K.V. Robpca: a new approach to robust principal component analysis. Technometrics. 2005;47:64–79.
JOHNSON R.A. and WICHERN D.W. Applied Multivariate Statistical Analysis. 6th ed. USA: Pearson; 2007.
JOLLIFFE, I.T. Principal component analysis. New York: Springer; 2002.
LOCANTORE, N.; MARRON, J.S.; SIMPSON, D.G.; TRIPOLI, N.; ZHANG, J.T.; COHEN, K.L. Robust principal components for functional data. Test. 1999;8:1–28
MARONNA. R. Principal components and orthogonal regression based on robust scales. Technometrics. 2005;47:264–273.
RODRIGUES, P.C.; MONTEIRO, A.; LOURENÇO, V.M. A Robust additive main effects and multiplicative interaction model for the analysis of genotype-by-environment data. Bioinformatics. 2016;32:58–66.
RODRIGUES, P.C. Componentes Principais: o método e suas generalizações (Principal Components: the method and its generalizations). In Lisbon, Portugal [dissertation]. Lisbon: Instituto Superior Técnico, Technical University of Lisbon; 2007.
TODOROV, V.; FILZMOSER, P. An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software. 2009;32:1–47.
Published
How to Cite
Issue
Section
License
To access the DECLARATION AND TRANSFER OF COPYRIGHT AUTHOR’S DECLARATION AND COPYRIGHT LICENSE click here.
Ethical Guidelines for Journal Publication
The Ciência e Natura journal is committed to ensuring ethics in publication and quality of articles.
Conformance to standards of ethical behavior is therefore expected of all parties involved: Authors, Editors, Reviewers, and the Publisher.
In particular,
Authors: Authors should present an objective discussion of the significance of research work as well as sufficient detail and references to permit others to replicate the experiments. Fraudulent or knowingly inaccurate statements constitute unethical behavior and are unacceptable. Review Articles should also be objective, comprehensive, and accurate accounts of the state of the art. The Authors should ensure that their work is entirely original works, and if the work and/or words of others have been used, this has been appropriately acknowledged. Plagiarism in all its forms constitutes unethical publishing behavior and is unacceptable. Submitting the same manuscript to more than one journal concurrently constitutes unethical publishing behavior and is unacceptable. Authors should not submit articles describing essentially the same research to more than one journal. The corresponding Author should ensure that there is a full consensus of all Co-authors in approving the final version of the paper and its submission for publication.
Editors: Editors should evaluate manuscripts exclusively on the basis of their academic merit. An Editor must not use unpublished information in the editor's own research without the express written consent of the Author. Editors should take reasonable responsive measures when ethical complaints have been presented concerning a submitted manuscript or published paper.
Reviewers: Any manuscripts received for review must be treated as confidential documents. Privileged information or ideas obtained through peer review must be kept confidential and not used for personal advantage. Reviewers should be conducted objectively, and observations should be formulated clearly with supporting arguments, so that Authors can use them for improving the paper. Any selected Reviewer who feels unqualified to review the research reported in a manuscript or knows that its prompt review will be impossible should notify the Editor and excuse himself from the review process. Reviewers should not consider manuscripts in which they have conflicts of interest resulting from competitive, collaborative, or other relationships or connections with any of the authors, companies, or institutions connected to the papers.