Volume 1842, Issue 1
12 May 2017
THE 3RD ISM INTERNATIONAL STATISTICAL CONFERENCE 2016 (ISMIII): Bringing Professionalism and Prestige in Statistics
9–11 August 2016
Kuala Lumpur, Malaysia
Research Article May 12 2017
Loong Chuen Lee;
Loong Chuen Lee
1Forensic Science Program, Faculty of Health Sciences
Universiti Kebangsaan Malaysia
, 50300 UKM Kuala Lumpur,
MALAYSIA
2School of Mathematical Sciences, Faculty of Science and Technology
Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE,
MALAYSIA
ChoongYeun Liong;
ChoongYeun Liong ^{a)}
2School of Mathematical Sciences, Faculty of Science and Technology
Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE,
MALAYSIA
Abdul Aziz Jemain
Abdul Aziz Jemain
2School of Mathematical Sciences, Faculty of Science and Technology
Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE,
MALAYSIA
Author & Article Information
a)
Corresponding author: lg@ukm.edu.my
AIP Conf. Proc. 1842, 030024 (2017)

Citation
Loong Chuen Lee, ChoongYeun Liong, Abdul Aziz Jemain; Qmode versus Rmode principal component analysis for linear discriminant analysis (LDA). AIP Conf. Proc. 12 May 2017; 1842 (1): 030024. https://doi.org/10.1063/1.4982862
Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable construction methods or both. Focus of PCA can be on the samples (Rmode PCA) or variables (Qmode PCA). Traditionally, Rmode PCA has been the usual approach to reduce highdimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Qmode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from Rmode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Qmode PCA show the opposites. With that, we concluded that PCs produced from Qmode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.
© 2017 Author(s).
2017
Author(s)
