## Sunday, April 6, 2008

### Problem set in marketing research course

Executive Summary

From the given data, we will use statistical techniques to analyze the research data and draw conclusion. In this report, we use statistical techniques such as frequency distribution, descriptive statistic, cross-tabulation, sample mean comparison test, ANOVA, correlation, simple and multiple linear regression, discriminant analysis, factor analysis and cluster analysis. All the statistic tests use 95% confidential level or　a =0.05.

1. Frequency distributions

The frequency distribution reports the number of responses that each question received in frequency and percentage. There are 45 responses for user groups but 44 responses for awareness, attitude, preference, intention and loyalty. It means that there are 5 missing values in the data set.

From Table 01-01 and Table 01-02, we can learn that the most frequent users of Docomo is light users (42.22%) and female users are more than male users (53.33% vs. 46.67%). We can observe a highest frequency in each table from Table 01-03 to Table 01-07 but we might not draw any conclusion by basing only on frequency distribution. So we need to analyze the research results further in the later part of this report.

2. Descriptive Statistics

The descriptive report describes the information in a frequency table 02 which includes the measures of tendency (mean, median p50), dispersion (range, standard deviation, coefficient of variance) and shape (skewness, kurtosis) of the variables of the sample under consideration. From the means values of user groups and sex, we can have the same conclusion as above. We see that the mean values of other variables are around the median of such variables. This means that awareness, attitude, preference, intention and loyalty are generally favorable. Among these 5 variables, attitude is the most disperse variable with the highest standard deviation (1.909), highest coefficient of variance (0.469) and the most skew variable. Then awareness and intention are the 2nd and 3rd disperse and skew variables. Obviously the table shows that sex is the least disperse variable because it has only two possible responses.

3. Cross Tabulation tabulate of user group and sex

The cross-tabulation table 03 represents a statistical analysis of the relationship between two nominal variables user group and sex to see whether there is an association between user group and gender. We use Chi-square test of independence. Null hypothesis (Ho) is that there is no association between gender and user group, and Alternative hypothesis (Ha) is that there is an association between gender and user group. Computed Chi-square value is 6.3413 which is lower than the table Chi-square value 67. 50 at degree of freedom 43 and a =0.05. This leads to the conclusion that Ho is accepted and Ha is rejected. In other words, there is no association between user group and sex or user group and sex are independent.

4. One sample mean comparison test

Table 04 shows the right-tail testing hypothesis for the mean of awareness. Ho is whether the awareness is below or equal to 3. Ha is whether the awareness exceeds 3. Computed T-value is 4.1621 which is greater than table T-value 1.671 at degree of freedom 43 and a =0.05. Therefore, we reject Ho and accept Ha. This means that the awareness is greater than 3.

5. Group mean comparison test

Table 05-01 presents the two-tail hypothesis testing for the difference between the means of females’ awareness and males’ awareness. We are interested in knowing whether males and females differ in their awareness for Docomo. Ho is that there is no difference between males’ awareness and females’ awareness. Ha is that there would be a difference between males’ awareness and females’ awareness. Computed T-value is -2.3944 which is smaller than table T-value -2.000 at degree of freedom 42 and a/2 =0.025. Therefore, we reject Ho and accept Ha. In other words, males and females differ in their awareness for Docomo.

Table 05-02 presents the two-tail hypothesis testing for the difference between the means of females’ attitude and males’ attitude. We are interested in knowing whether males and females differ in their attitude for Docomo. Ho is that there is no difference between males’ attitude and females’ attitude. Ha is that there would be a difference between males’ attitude and females’ attitude. Computed T-value is -1.9002 which is greater than table T-value -2.000 at degree of freedom 42 and a/2 =0.025. Thus, Ho is accepted or males and females do not differ in their attitude for Docomo.

Table 05-03 presents the two-tail hypothesis testing for the difference between the means of females’ loyalty and males’ loyalty. We are interested in knowing whether males and females differ in their loyalty for Docomo. Ho is that there is no difference between males’ loyalty and females’ loyalty. Ha is that there would be a difference between males’ loyalty and females’ loyalty. Computed T-value is 0.9025 which is smaller than table T-value 2.000 at degree of freedom 42 and a/2 =0.025. Therefore, we accept Ho and reject Ha. In other words, males and females do not differ in their loyalty for Docomo.

Table 05-04 presents the two-tail hypothesis testing for the difference between awareness and loyalty. We are interested in knowing whether awareness level is higher than loyalty level. Ho is that awareness level is less than or equal to loyalty level. Ha is that awareness level is higher than loyalty level. Computed T-value is 0.6025 which is smaller than table T-value 1.671 at degree of freedom 42 and a =0.05. Therefore, we accept Ho and reject Ha. In other words, awareness level is not higher than loyalty level.

6. One-way ANOVA analysis

Table 06-01 is an ANOVA table to present the hypothesis test about the difference between several means. We are interested in knowing whether awareness varies across user groups. The response variable is awareness and the factor variable is user groups. Ho is that awareness does not vary across user groups (or user groups have no effect on awareness). Ha is that awareness varies across user groups. If Ho is true, F ratio should be close to 1. However, the probability of getting F-statistic of 49.23 or larger is zero. Therefore, there is substantial evidence that Ho is not true. In other words, because computed F-value 49.23 is much greater than the critical F-value of F2, 41, 0.025, which is around 4.98 (not on the F-table in the text book but this is F-value of F2, 60, 0.01 from a F- statistic table), we reject Ho and accept Ha. This means that user groups differ in terms of awareness.

Table 06-02 is an ANOVA table to test whether attitude varies across user groups. The response variable is attitude and the factor variable is user groups. Ho is that attitude does not vary across user groups (or user groups have no effect on attitude). Ha is that attitude varies across user groups. Because computed F-value 37.23 is much greater than the critical value of F2, 41, 0.025, (see the above explanation), we reject Ho and accept Ha. In other words, user groups differ in terms of attitude.

Table 06-03 is an ANOVA table to test whether preference varies across user groups. The response variable is preference and the factor variable is user groups. Ho is that preference does not vary across user groups. Ha is that preference varies across user groups. Computed F-value 19.20 is much greater than the critical F-value of F2, 41, 0.025. Hence, we reject Ho and accept Ha. This means that user groups differ in terms of preference.

Table 06-04 is an ANOVA table to test whether intention of purchase varies across user groups. The response variable is intention and the factor variable is user groups. Ho is that user groups do not differ in terms of intention. Ha is that user groups differ in terms of intention. Computed F-value 0.07 is much smaller than the critical F-value of F2, 41, 0.025. Thus, we fail to reject Ho. This means that user groups do not differ in terms of intention.

Table 06-05 is an ANOVA table to test whether loyalty varies across user groups. The response variable is loyalty and the factor variable is user groups. Ho is that user groups do not differ in terms of loyalty. Ha is that user groups differ in terms of loyalty. Because the computed F-value 0.03 is much smaller than the critical F-value of F2, 41, 0.025 and the probability to observe F-value of or greater than 0.03 is very high at 0.9736, we need to accept Ho. This means that user groups do not differ in terms of loyalty.

7. Simple correlations – Pairwise correlations

Table 07 is a pairwise correlation table presenting the sample correlation coefficients (r) between each pair of variables. The number with stars mark indicates that the correlation is significant. We need to test the significance of population correlation coefficients (p) between Ho: p = 0 against Ha: p ¹ 0. We can see from the table that there are significant relationships between attitude and awareness; preference and awareness; preference and attitude; loyalty and intention. However, the relationships between intention and awareness; intention and attitude; intention and preference; loyalty and awareness; loyalty and attitude; loyalty and preference are relationships by chance occurrence. In other words, these relationships are not significant or not strong enough to be considered.

8. Linear Regression

Table 08 is the simple linear regression analysis result between loyalty as dependent variable and intention as independent variable. The regression function is Loyalty = 0.88 + 0.76 intention (hereafter referred to model 1).

We use t-test to test the significance of the independent variable. Ho: there is no linear relationship between loyalty and intention. Ha: there is a linear relationship between loyalty and intention. Computed T-value is 7.47 which is greater than t41, 0.025 =2.00. Therefore we reject Ho and accept Ha. In other words, there is a linear relationship between loyalty and intention.

To determine the correlation between independent and dependent variables, we test coefficient of determination R2. Ho: R2 of the population is equal to zero. Ha: R2 is not equal to zero. Since computed F-value 55.79 is greater than critical F-value which is around 7.08 we reject Ho and accept Ha. So there is a significant correlation between loyalty and intention.

9. Multiple Regression

Table 09 is the multiple regression analysis between loyalty as dependent variable and awareness, attitude, preference and intention as independent variables. The linear regression equation is loyalty = 0.53 + 0.03 awareness – 0.03 attitude + 0.06 preference + 0.78 intention (hereafter referred to model 2).

We also use t-test to determine the significance of each independent variable. Ho: there is no linear relationship between loyalty and awareness. Ha: there is a linear relationship between loyalty and awareness. As the computed T-value 0.18 is smaller than t41, 0.025 =2.00, Ho is accepted. In other words, there is no significant linear relationship between loyalty and awareness. In the same way, we can draw a conclusion that there are no linear relationships between loyalty and attitude and preference. However, there are linear relationship between loyalty and intention.

To determine the correlation between independent and dependent variables, we test coefficient of determination R2. Ho: R2 of the population is equal to zero. Ha: R2 is not equal to zero. Since computed F-value 12.48 is greater than critical F-value which is around 7.08, we reject Ho and accept Ha. So there is a significant correlation between loyalty and independent variables. However, among 4 independent variables in this model, intention has the strongest impact on loyalty because intention is the largest among standardized beta coefficients.

In this multiple analysis we have added 3 more independent variables (awareness, attitude, preference) to the simple regression model which is represented by equation (1). We are interested in knowing whether adding such independent variables will help to explain more variation in the dependent variable loyalty. We do a test on increment in the proportion of variance accounted for by additional variables. Ho: adding more independent variables will not have any effect on loyalty. Ha: adding more independent variables will affect loyalty. The computed F-value (formula in the text book, chapter 19) 0.16 is smaller than the critical value of F4,39, 0.05 @ 3.83. Thus, we fail to reject Ho. This means that adding more independent variables will not have any effect on loyalty. This is also true when we look at the small contribution of awareness, attitude and preference in the regression equation through standardized beta coefficients.

The coefficient of intention in model 2 is slightly greater than that in model 1 (0.78 vs. 0.76). This change when more independent variables are added into model 1 is called multicollinearity. The reason of this phenomenon is that there might be correlations between intention and other 3 independent variables in model 2. Hence, we need to test the correlations among independent variables to know this effect on the dependent variable. In this case, the stepwise regression model should be used. However, in the scope of this report we do not test such multicollinearity effect.

10. Discriminant Analysis

Table 10 presents the result of discriminant analysis of the available data. The purpose of this analysis is to see whether three user groups differ in terms of awareness, attitude, preference, intention and loyalty. We have 3 groups and 5 predictor variables so the number of the discriminant functions should be 2. Function 1: Z1 = -5.14 + 0.54 awareness + 0.55 attitude + 0.41 preference – 0.09 intention – 0.19 loyalty. Function 2: Z2 = -0.31 – 0.64 awareness + 0.12 attitude + 0.76 preference – 0.39 intention + 0.23 loyalty.

We determine the significance of Z1 and Z2 by Chi-square test. Ho is the group means of Z1 are equal (ma = mb) in the population. Computed Chi-square of Z1 is 70.15 which is larger than table Chi-square10,0.05 = 18.31. Therefore, we must reject Ho and accept Ha, which is the group means of Z1 are different or unequal. We use the same Ho for Z2. Computed Chi-square of Z2 is 5.858 which is smaller than table Chi-square4,0.05 = 9.49. Therefore, we must accept Ho, which is the group means of Z2 are equal. In other words, Z1 is significant but Z2 is not significant. Hence, we will examine only the significant function Z1 later in this report.

From the function Z1 we can see unstandardized discriminant weights of predictor variables. The importance of each predictor in the discriminant function is shown by standardized canonical discriminant function coefficients. We can see that attitude has the greatest impact on Z1 than the other variables because it has the largest standardized coefficient (0.5883). Awareness and preference have less impact on Z1. Among the predictors, loyalty has the least impact. Based on this analysis we can conclude that attitude is the most important discriminating variable of user groups.

We also can analyze the importance of the predictors by examining the canonical discriminant structure matrix (discriminant loadings) in which the simple correlations between each predictor and the discriminant function represent the variance that the predictor shares with the function. Awareness has the greatest share of variance with the function, followed by attitude, preference, intention and loyalty. However, it is hard to draw any conclusion from this analysis because the sample size (45) is not large enough in comparison to the number of predictors (5).

We use classification matrix, which is Tabulate usergr_daclass table, and hit ratio to evaluate how well groups are classified. The number on the diagonal of the matrix shows correct classification and other numbers in the matrix are incorrect classification. The hit ratio is the percentage of correct classifications over the total number of classifications. The hit ratio is (14+7+16)/45 = 82.22%. The maximum chance criteria of 3 groups of size 19, 10 and 16 is 19/45 = 42.22%. The proportional chance criteria is (19/45)2+(10/45)2+(16/45)2 = 35.4%. The hit ratio is larger than both chance criteria and proportional chance criteria. Hence we can conclude that the group classification is good. In other words, it is worthwhile to pursue the discriminant analysis in this case.