## Sunday, April 6, 2008

### Survey data alanysis of Creativity research project

December 22, 2006

Executive Summary

From the given data, we will use factor analysis to reduce the number of variables representing three characteristics of creativity: Dimensions, Antecedents and Consequences. Then we use multiple linear regression based on factor variables to examine the relationship chain Antecedents - Dimensions - Consequences. We also perform cluster analysis to see whether we can classify the population into groups. Finally, we use discriniminant analysis to understand how we can segment the population and whether such segmentation is significant. All the statistic tests use 95% confidential level or　a =0.05.

1. Determining factors

From Table 01, the first 2 factors have eigenvalue greater than 1. The scree plot of eigenvalue in Figure 01 suggests that 2 factors are suitable to represent Dimensions. Therefore, we choose the first 2 factors. By re-running the principal component analysis of these 2 factors and rotate the result, we can have a clearer interpretation of these 2 factors. The factor 1 (appeared as Comp1 in Table 02) has the strongest factor loadings on variables C_D_1, C_D_2 and C_D_3 which relate to the new situations where creativity is carried out. Therefore, we name this factor ‘New Situation’. In the same way, we find that creativity is carried out for the purpose of experiencing new things that the creator has never experienced before. Thus, we name factor 2 ‘New Experience’ which represents 4 variables C_D_4, C_D_5, C_D_6 and C_D_7. We will use New Situation and New Experience factors on behalf of 7 variables with the factor scores in Table 03 in our later analysis.

By analyzing 17 variables of Antecedents in the same way, we conclude that there are 2 factors representing Antecedents and we name these 2 factors ‘Benefit’ and ‘Recognition-Collaboration’ (Table 04). In analyzing 10 variables of Consequences, we decide 3 representing factors and name them ‘Cost Saving’, ‘Fun’ and ‘Satisfaction’ (Table 05).

2. Examining the relationship Antecedents – Dimensions – Consequences

Table 06 is the multiple linear regression analysis result between New Situation as dependent factor variable and Recognition-Collaboration and Benefit as independent factor variables. The regression function is New Situation = 0.232 Benefit + 0.093 Recognition-Collaboration (hereafter referred to model 1). We ignore the constant because it is very small and not significant.

We use t-test to test the significance of the independent factor variables. The null hypothesis Ho: there is no linear relationship between dependent factor variable and independent factor variables. Computed t-value of Benefit is 1.99 which is greater than t84, 0.025 =1.96. Therefore we reject Ho and accept Ha. In other words, there is a linear relationship between New Situation and Benefit. However, t-value of Recognition-Collaboration is 1.2 which is smaller than 1.96 so there is no significant linear relationship between New Situation and Recognition-Collaboration.

To determine the correlation between independent and dependent factor variables, we test the coefficient of determination R2. Ho: R2 of the population is equal to zero. Ha: R2 is not equal to zero. Since computed F-value 4.5 is greater than critical F-value which is around 3.00, we reject Ho and accept Ha. So there is a significant correlation between independent and dependent factor variables. However, among 2 independent factors in this model, Benefit has the strongest impact on New Situation because Benefit has the largest standardized beta coefficient. In other words, Benefit can generate New Situation but Recognition-Collaboration does not generate a significant New Situation.

By examining Table 07 in the same way, we find that there is no significant linear relationship between New Experience and Benefit and Recognition-Collaboration. This is because computed t-value of independent factor variables are smaller than t-table value and computed F-value does not exceed F-critical value. This means that both Benefit and Recognition-Collaboration do not generate New Experience.

By interpreting statistically Table 08, we find that there is a significant linear relationship between Satisfaction and New Situation and New Experience. The linear equation should be Satisfaction = 0.135 New Situation + 0.318 New Experience (model 2). Among these two independent factor variables, New Experience has the strongest impact on Satisfaction because of its largest standardized beta coefficient but New Situation does not have significant impact. Thus, the meaning of this equation is that New Experience can generate Satisfaction but New Situation does not lead to significant Satisfaction.

From Table 09, we find that there is a significant relationship between Cost Saving and New Situation and New Experience through the linear equation Cost Saving = 0.298 New Situation + 0.299 New Experience (model 3). The meaning of this equation is that New Situation and New Experience can generate Cost Saving.

From Table 10, we find that there is no significant linear relationship between Fun and New Situation and New Experience. This is because computed t-values of independent factor variables are smaller than t-table value and computed F-value does not exceed F-critical value. This means that both New Situation and New Experience do not lead to significant Fun.

We can examine the relationship between Antecedents, Dimensions and Consequences by examining the relationship of the factors representing them. Model 1 suggests that Benefit generates New Situation. Then New Situation generates Cost Saving and maybe Satisfaction (model 3). This is obvious that people seek new situation for benefits and whenever they achieve some benefits they can save costs. There are significant relationship between New Experience and Satisfaction and Cost Saving (model 2 and 3) but there is no significant link between New Experience and Benefit or Recognition-Collaboration. In reality, benefits, recognition and collaboration could be the purpose of seeking new experience.

The reason why statistical analysis could not capture the relationship between New Experience and Benefit or Recognition-Collaboration is that Benefit and Recognition-Collaboration factors do not completely and adequately represent Antecedents. Another reason is that there are several errors in determining the number of factors, interpreting the factors and rotating the factors to select the best ones. As a result, errors in running multiple regression based on factor variables instead of original variables might cause inaccurate statistical results. Despite the statistic result in Table 07, we still can say that Benefit or Recognition-Collaboration can lead to New Experience.

In conclusion, Benefit causes New Situation and New Experience and the latter cause Cost Saving and Satisfaction. In other words, the relationship between Antecedents - Dimensions - Consequences is causal relationship in which Antecedents lead to Dimensions and Dimensions lead to Consequences. Model 1 is suitable for predicting Dimensions from Antecedents. Model 2 and model 3 could be used to predict Consequences based on Dimensions. However, we still need one more model to present the relationship between New Experience and Benefit or Recognition-Collaboration as the regression result does not suggest such relationship due to some shortcomings of factor analysis method.

3. Classification

We group the respondents by using Ward’s linkage hierarchical clustering method on New Situation and New Experience factors of Dimension. The stopping rule to determine the number of cluster is Calinski & Harabasz rule. We pick 2 clusters that have the longest branches before splitting into many shorter branches in the dendrogram (figure 02). Moreover, we notice from Table 11 that 2 clusters have Calinski/Harabasz pseudo-F index (77.36) greater than the indexes of 3, 4, 5, 6 and 7 clusters. This means that 2 clusters are more distinct than 3, 4, 5, 6 and 7 clusters. Even though 8 clusters have greater index than 2 clusters, we think that 8 or more clusters might not be significantly distinct in practice. This is because a large number of clusters might not be necessary to divide perfectly 85 observations into groups. Therefore, we choose 2 clusters for further analysis.

We profile these two clusters in terms of factor variables. The clusters have been constructed based on 2 factors New Situation and New Experience. Both of these factors are caused significantly by Benefit as the above analysis. Thus, we see that Benefit should be an important characteristic of the clusters.

To identify whether Benefit is a perfect criterion to differentiate the two clusters, we compare the means of factor score variables over both clusters. By examining the mean factor scores in Table 12, we find that the centroids of two clusters of both New Situation and New Experience are not significantly different. Therefore, Benefit is not sufficient enough to differentiate the two clusters and we need to profile the clusters in terms of variables that were not used for clustering. We will use discriminant analysis to find out what moderator and demographic variables can significantly differentiate between the clusters.

4. Discriminant Analysis

The purpose of this analysis is to test whether significant differences exist between these 2 clusters in terms of moderator and demographic variables. We have 2 groups and 16 predictor variables so the number of the discriminant functions should be 1. The unstandardized and standardized coefficients of the discriminant function (hereafter referred to Z) are presented in Table 13.

We determine the significance of Z by Chi-square test. Ho is the group means of Z are equal (ma = mb) in the population. Computed Chi-square of Z is 26.369 which is larger than table Chi-square16,0.05 = 26.30. Therefore, we must reject Ho and accept Ha, which is the group means of Z are different or unequal. Although there is about 5% probability that observations are not perfectly reflected by Z, we still can conclude that the discriminant function is significant.

We examine the discriminant power of each variables in Z based on standardized canonical discriminant function coefficients. We pick 3 moderator variables C_M_1, C_M_3 and C_M_6, Sex and Age because they have the largest standardized coefficients among the others. Coming back to the survey questions, we see that these 3 moderator variables actually refer to the inside motivation of creativity. Among these 5 variables, C_M_6 has the strongest impact on Z with the standardized coefficient of 0.949. Hence, we conclude that ‘Being Creative’ has the largest discriminant power, followed by Sex and Age. In other words, ‘Being Creative’ is the most important critical criterion of the clusters and Sex and Age should be criteria for clustering.

We also analyze the importance of the predictor variables by examining the canonical discriminant structure matrix (discriminant loadings, Table 13) in which the simple correlations between each predictor and Z represent the variance that the predictor shares with Z. Variable C_M_6 has the greatest share of variance with Z, followed by C_M_1 and Sex. Age has very small share of variance with Z. Therefore, we conclude that ‘Being Creative’ and Sex are the most correlated to Z and should be the critical criteria for classification of the population.

We use classification matrix (Table 14) and hit ratio to evaluate how good the classification is in this discriminant analysis. The number on the diagonal of the matrix shows correct classification and other numbers in the matrix are incorrect classification. The hit ratio is the percentage of correct classifications over the total number of classifications. The hit ratio is (34+28)/85 = 72.94%. The maximum chance criteria of 2 groups of size 47 and 38 is 47/85 = 55.29%. The proportional chance criteria is (47/85)2+(38/85)2 = 50.56%. The hit ratio is larger than both chance criteria and proportional chance criteria. Therefore, we can conclude that the group classification is good. In other words, it is worthwhile to pursue the discriminant analysis in this case.

Conclusion  The relationship in the Antecedent Dimension Consequence chain is a causal relationship in which Antecedents leads to Dimensions and Dimensions leads to Consequences. More concretely, Benefit causes New Situation and New Experience; New Situation and New Experience in turn lead to Cost Saving and Satisfaction. In other words, benefit is a starting and driving factor in this chain and therefore is the root cause of creativity.

The population can be divided into two groups based on 2 criteria: ‘Being Creative’ and Sex. The discriminant function Z used for classification should be simplified to Z = 0.949 Being Creative + 0.409 Sex. Benefit is the important characteristic of each group but it is not a critical criterion to differentiate the population into distinct groups.

The entire population tends to create new things for benefits. The function Z above can be used to predict a member of this population whether he or she belongs to what group.