Prominent Statistical Software R Language for Factor Analysis

Dr Shruti Traymbak

Associate Professor, JIMS Kalkaji

R offers one of the diversified statistical packages and libraries and provides user-friendly environment for statistical computing and design. It is one of the most prominent statistical software used for data analysis. R generally belongs with the Command-line interface which is user-friendly in all platforms such as Windows, LINUX and macOS. Due to opensource software, it is widely used tool for machine learning, statistics and data analysis. Apart from that data, objects, functions and packages can be created easily. It is a not only statistics package but also allows users to integrate with other languages like C, C++.

Exploratory Factor analysis or factor analysis (FA), principal component analysis (PCA) and cluster analysis are considered to be one of the powerful dimension reduction techniques which applies in various areas of statistical analysis like social sciences, marketing, psychology and so on. There are various libraries and packages in R to run factor analysis.  The psych package developed by Revelle,2020 and most useful for personality and psychometric research.

There are Several of the functions in psych address the problem of data reduction. fa incorporates five alternative algorithms: miners factor analysis, principal axis factor analysis, weighted least squares factor analysis, generalized least squares factor analysis and maximum likelihood factor analysis. In case of Principal Component Analysis (PCA), the objective is to reduce variables into factors to check multi-collinearity. Cluster analysis, used to group variables to alleviate complexity of data sets by forming homogenous items.

Factor analysis is considered one of the widely accepted methods in psychology, social sciences and educational sciences. The main objective of factor analysis is to achieve latent variables than the number of items and convert variables into factors to check multi collinearity. There are two types of factor analysis -exploratory and confirmatory factor analysis. Exploratory Factor Analysis represents model of variables and Confirmatory Factor Analysis is based on theoretical and experimental structure

There are various methods to extract factors in EFA, like maximum likelihood (ML), principal components, unweighted least squares, weighted least squares (WLS), principal axis factoring (PAF) etc.

Apart from that like scree plot 6 minimum average partial (MAP)analysis, parallel analysis (PA) etc are used for deciding numbers of factors. Most of the researchers found that PA and MAP analysis provides more accurate results.

Factor analysis requires various steps to convert variables into factors, like Cortest Barlett in R represents significance level determines that whether datasets are suitable for factor analysis or not, if p vale is less than 0.05 then factor analysis is possible. Second is Kaiser-Meyer-Olkin (KMO) Test for Sampling Adequacy. KMO test is used to measure whether sample is adequate for factor analysis or not, in statistics KMO values between 0.8 and 1 indicate sample is adequate, if value is less than 0.6 indicate sample is not adequate and corrective action should be taken. In R KMO test is applied with the help of “KMO” function. Further Scree plot determines the number of factors to get in Exploratory Factor Analysis (EFA) or Principal Component Analysis (PCA). fa function is present in psych library. Are four factors being sufficient for explaining personality traits or how many factors required for explaining personality traits which is called Eigen values in statistics. More the Eigen Values, good is the factors.

There are various library and packages in R that support in factor analysis, for example “psych” package or library helps in to run factor analysis on R .bfi data sets  are 25 personality self-report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web-based personality assessment project which is present in psych library. The data frame bfi consists 2800 observations that has 28 variables. The data frame consists 28000 observations that has 28 variables. Apart from bfi data sets, there are so many inbuilt data sets in R. To find which variables are loaded in each factor and with the help of function print (fa4.out$structure, cut-off=0, digits=3) on R.

Thus, factor analysis is one of the important tests to check multi-collinearity of data while considering essence of data. Jagannath International Management School (JIMS) Kalkaji , New Delhi  has these days been offered the distinguished workshops on R for  PGDM Business Analytics specialization students.

#jims #jimsdelhi #managementcollegeindelhi #pgdmcollegesindelhi #mbacollegesindelhi #toppgdmCollegesindelhi #topbschoolsindelhi

For more information visit: https://www.jagannath.org/

Written by

Leave a Reply

Your email address will not be published. Required fields are marked *