The usefulness of robust multivariate methods: A case study with the menu items of a fast food restaurant chain

Authors

DOI:

https://doi.org/10.5902/2179460X39892

Keywords:

Multivariate statistics, Data science, Robust principal component analysis, Robust cluster analysis, Data visualization, Multivariate outlier detection

Abstract

Multivariate statistical methods have been playing an important role in statistics and data analysis for a very long time. Nowadays, with the increase in the amounts of data collected every day in many disciplines, and with the raise of data science, machine learning and applied statistics, that role is even more important. Two of the most widely used multivariate statistical methods are cluster analysis and principal component analysis. These, similarly to many other models and algorithms, are adequate when the data satisfies certain assumptions. However, when the distribution of the data is not normal and/or it shows heavy tails and outlying observations, the classic models and algorithms might produce erroneous conclusions. Robust statistical methods such as algorithms for robust cluster analysis and for robust principal component analysis are of great usefulness when analyzing contaminated data with outlying observations. In this paper we consider a data set containing the products available in a fast food restaurant chain together with their respective nutritional information, and discuss the usefulness of robust statistical methods for classification, clustering and data visualization.

Downloads

Download data is not yet available.

Author Biographies

Paulo Jorge Canas Rodrigues, Universidade Federal da Bahia, Salavador, BA

Professor de Estatísitca na Universidade Federal da Bahia

Rafael Almeida, Universidade Federal da Bahia, Salavador, BA

Graduação em andamento em Estatística na Universidade Federal da Bahia

Kézia Mustafa, Universidade Federal da Bahia, Salavador, BA

Graduação em andamento em Estatística na Universidade Federal da Bahia

References

CUESTA-ALBERTOS, J.A.; GORDALIZA, A.; MATRÁN, C. "Trimmed k-means: an attempt to robustify quantizers". Annals of Statistics. 1997;25:553-576.

COHEN FREUE, G.V.; HOLLANDER, Z.; SHEN, E.; ZAMAR, R.H.; BALSHAW, R.; SCHERER, A.; MCMANUS, B.; KEOWN, P.; MCMASTER, W.R.; NG, R.T. MDQC: A New Quality Assessment Method for Microarrays Based on Quality Control Reports. Bioinformatics. 2007;23:3162 – 3169.

CROUX, C.; RUIZ-GAZEN, A. High breakdown estimators for principal components: The projection-pursuit approach revisited. Journal of Multivariate Analysis. 2005;95:206–226.

CROUX, C.; FILZMOSER, P.; OLIVEIRA, M. Algorithms for Projection-Pursuit Robust Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems. 2007;87:218–225.

FILZMOSER, P.; GARRETT, R.G.; REIMANN, C. Multivariate outlier detection in exploration geochemistry. Computers and Geosciences. 2005;31:579-587.

FILZMOSER, P.; TODOROV, V. Robust tools for the imperfect world. Information Sciences. 2013;245:4–20.

GABRIEL, K.R. The biplot graphic display of matrices with application to principal component analysis. Biometrika. 1971;58:453–467

GARCÍA-ESCUDERO, L.A.; GORDALIZA, A. Robustness Properties of k-Means and Trimmed k-Means. Journal of the American Statistical Association. 1999;94:956–969

GARCÍA-ESCUDERO, L.A.; GORDALIZA, A.; MATRÁN, C.; MAYO-ISCAR, A. A Review of Robust Clustering Methods. Advances in Data Analysis and Classification. 2010;4:89–109.

GOWER, J.C. A general coefficient of similarity and some of its properties. Biometrics. 1971;27:857–874.

HAWKINS, D.M.; LIU, L.; YOUNG, S. Robust Singular Value Decomposition, National Institute of Statistical Sciences. Technical Report Number. 2001;122.

HUBER P.J.; RONCHETTI E.M. Robust Statistics. 2nd ed. USA: Wiley; 2009.

HUBERT, M.; ROUSSEEUW, P.J.; BRANDEN, K.V. Robpca: a new approach to robust principal component analysis. Technometrics. 2005;47:64–79.

JOHNSON R.A. and WICHERN D.W. Applied Multivariate Statistical Analysis. 6th ed. USA: Pearson; 2007.

JOLLIFFE, I.T. Principal component analysis. New York: Springer; 2002.

LOCANTORE, N.; MARRON, J.S.; SIMPSON, D.G.; TRIPOLI, N.; ZHANG, J.T.; COHEN, K.L. Robust principal components for functional data. Test. 1999;8:1–28

MARONNA. R. Principal components and orthogonal regression based on robust scales. Technometrics. 2005;47:264–273.

RODRIGUES, P.C.; MONTEIRO, A.; LOURENÇO, V.M. A Robust additive main effects and multiplicative interaction model for the analysis of genotype-by-environment data. Bioinformatics. 2016;32:58–66.

RODRIGUES, P.C. Componentes Principais: o método e suas generalizações (Principal Components: the method and its generalizations). In Lisbon, Portugal [dissertation]. Lisbon: Instituto Superior Técnico, Technical University of Lisbon; 2007.

TODOROV, V.; FILZMOSER, P. An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software. 2009;32:1–47.

Downloads

Published

2020-09-03

How to Cite

Rodrigues, P. J. C., Almeida, R., & Mustafa, K. (2020). The usefulness of robust multivariate methods: A case study with the menu items of a fast food restaurant chain. Ciência E Natura, 42, e17. https://doi.org/10.5902/2179460X39892

Issue

Section

40 YEARS - Anniversary Special Edition