MCA Belgium and New Zealand Correct

Multiple Correspondence Analysis Lotte Zijlstra 25-10-2020 1 1. Introduction The role of the government differs among

Views 51 Downloads 0 File size 461KB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

Multiple Correspondence Analysis Lotte Zijlstra 25-10-2020

1

1. Introduction The role of the government differs among countries. In most countries the government takes the highest position of management and control and it so becomes its duty to serve the society and answer to their demands. Understanding their core believes are key factors to determine whether the government is performing sufficiently. It is important to identify the needs of the society in order to effectively allocate budget towards the right support systems. There might be countries that place more focus on educational support whereas other societies would prefer to receive housing support. Depending on each country these demands can be evaluated. The differences in demand can also be applied within a country to determine which groups of individuals maintain similar believes. For this research two countries are analyzed and compared on their societies believes of government responsibility. The first country is The European country Belgium and the second country is New Zealand. It would be interesting to investigate how these two countries whom are extremely far apart differ or are similar when it comes to their opinion towards government responsibility. Subsequently, the following research question will be investigated in this paper: What are the main differences among Belgium’s society believes and New Zealand society believes regarding the government’s responsibility?

2. Data The dataset is obtained from the International social Survey Program; Role of Government V published in 2018. This dataset consists of data of multiple countries all over the world. For this research specifically, the focus will remain solely on the countries Belgium and New Zealand. The government responsibility dataset for Belgium consists of 1569 observations and 8 variables. The New Zealand dataset consists of 1078 number of observations and 8 variables. The respondents were asked to express their opinion for different aspects on a scale from 1 to 5, where 1 denotes Definitely should be, 2 denotes probably should be, 3 denotes probably should not be and 4 definitely should not be. The original answer strongly against were recorded into 4 as this answer was rarely chosen. The variables related to the government responsibility were provide jobs, control prices, help industries, help unemployed, reduce gap, support students, provide housing, reduce industry pollution and promote gender equality. Some of the categories within in this data represent only a small percentage of the observations. To ensure that each category has a moderate marginal frequency the data is recoded to at least a 5% fill rate. This results in categories that are all sufficiently filled. When the fill rate was less than 5% of the total observations the category was merged. Provide_jobs Control_prices Help_industries Help_unemployed Reduce_gap Support_students Provide_housing Reduce_industrypollution Promote_genderequality 1:537 2:612 3:297 4:123

1:628 2:686 3:255 NA

1:351 2:765 3:371 4: 82

1:257 2:690 3:480 4:142

1:759 2:505 3:220 4: 85

1:630 2:745 3:194 NA

1:486 2:781 3:302 NA

1:890 2:543 3:136 NA

1:874 2:501 3:194 NA

Provide_jobs Control_prices Help_industries Help_unemployed Reduce_gap Support_students Provide_housing Reduce_industrypollution Promote_genderequality 1:140 2:312 3:401 4:225

1:250 2:520 3:247 4: 61

1:266 2:654 3:158 NA

1:140 2:537 3:310 4: 91

1:355 2:328 3:262 4:133

1:323 2:554 3:201 NA

1:274 2:569 3:235 NA

1:568 2:418 3: 92 NA

1:547 2:357 3:174 NA

3. Methods Multiple Correspondence Analysis For the multiple correspondence analysis more than two variables are nominal/categorical. The main purpose is to understand similarities between individuals in terms of all the variables. Individuals are compared by

2

category taking into account the rarity of that category. The formula of multiple correspondence analysis can be found below. p X σ(X, Y1 , Y2 , ...Yp ) = SS(X − Gj Yj ) j=1

When performing the multiple correspondence analysis the importance of dimension is determined by the eigenvalue of that specific dimension. Secondly, the discrimination measures give the fit of a variable for the dimension. The biplot shows the objects and categories of the variables. Finally, The joint plot shows only the categories. The biplot presents a global pattern within the data. Each color corresponds to a given column (variable). The distance between any row points or column points gives a measure of their similarity (or dissimilarity). This plot helps to determine which variables are associated with each dimension. The distance between category and origin measures the quality of the variable category in the biplot. The main interpretation is as follows: Categories are the centroids(average) of the objects that choose them. Categories far from the origin describe homogeneous groups of objects.

Permutation test To identify how many dimensions should be included in the analysis, a permutation test can be executed. When applying such permutation test the order of the values in shuffled and are chosen without repetition. The permutation test compares the observed eigenvalue to the distribution of eigenvalues of many data sets of random permutations of the columns. The significance of the contribution is determined for the separated variables to the MCA solution. In this example the permutation test is performed 100 times and defined at a 95% confidence interval. If the observed value falls outside the interval the dimension is significantly different from the random data. The plot containing the results of the permutation test can be seen below. The blue lines in this plot indicate the confidence interval whereas the red lines indicate the observed values. The permutation on MCA for Belgium shows that 3 dimensions are significant as the red line does not cross the blue line after the third dimension. For New Zealand this is slightly different as the blue line is crossed after two dimensions indicating that these dimensions are significant. This means that these dimensions can be used to analyse the data.

0.4 0.3 0.0

0.1

0.2

Eigenvalues

0.3 0.2 0.0

0.1

Eigenvalues

0.4

0.5

Permutation test MCA New Zealand

0.5

Permutation test MCA Belgium

2

4

6

8

2

Dimension

4

6

8

Dimension

Discrimination measure The importance of a variable is given by its discrimination measure. The discrimination measure of a certain variable in a certain dimension is the squared correlation between objects scores x and the quantified variable. The discrimination measure always takes a value between zero and one. It can be interpreted as the squared component loading, the length of the vector when we are counting from the origin. The average of the discrimination measures per dimension equals the eigen value. The discrimination measures will show which variables(and thus their categories) discriminate well. Categories of different variables that are close and far from the origin indicate that these categories often appear together. 3

Bootstrap Bootstrap in MCA can be used for testing assumptions. When applying the bootstrap certain conclusions regarding the mean, standard deviation and confidence interval can be tested. The bootstrap repeats observations of different samples with replacement. In this research the bootstrap is repeated 1000 times. Applying a large amount of repetition results in more accurate distributions of the tails in the bootstrap. The data will be tested according to the distribution. There are multiple methods that can be used for determining the number of dimensions. For this research an alpha of 5 % is used which results inthe 2.5 and 97.5 percentiles. H0 is in this case that the fit per variable is explained by the data and Ha is denoted to be that the fit per variable is not explained by the data.

4. Results Firstly, two biplots are presented to visualize the data. The colored text is related to the individual categories and the grey dots indicate the objects. Biplot New Zealand

0.10 0.05

Help_unemployed.4 Control_prices.4 Reduce_gap.4 Provide_jobs.4 Provide_housing.3 Reduce_industrypollution.3 Help_unemployed.1 Provide_jobs.1 Support_students.3 Promote_genderequality.3 Help_industries.1 Control_prices.1 Provide_housing.1 Support_students.1 Help_industries.3 Reduce_gap.1 Control_prices.3 Promote_genderequality.1 Reduce_industrypollution.1 Help_unemployed.3 Reduce_gap.3 Reduce_industrypollution.2 Help_industries.2 Control_prices.2 Provide_jobs.3 Provide_jobs.2 Promote_genderequality.2 Support_students.2 Help_unemployed.2 Provide_housing.2 Reduce_gap.2

−0.05

0.00

Dimension 2

−0.10

−0.05

0.00

Reduce_gap.2 Provide_housing.2 Help_unemployed.2 Provide_jobs.2 Promote_genderequality.2 Support_students.2 Help_industries.2 Control_prices.2 Reduce_industrypollution.2 Provide_jobs.3 Reduce_industrypollution.1 Help_unemployed.3 Reduce_gap.1 Help_industries.3 Promote_genderequality.1 Control_prices.1 Support_students.1 Provide_jobs.1 Reduce_gap.3 Provide_housing.1 Help_industries.1 Help_unemployed.1 Control_prices.3 Promote_genderequality.3 Reduce_industrypollution.3 Provide_housing.3 Support_students.3 Help_industries.4 Provide_jobs.4 Help_unemployed.4 Reduce_gap.4

−0.15

Dimension 2

0.05

Biplot Belgium

−0.2

−0.1

0.0

0.1

0.2

−0.1

Dimension 1

0.0

0.1

0.2

Dimension 1

The jointplots can be found below for both Belgium and New Zealand. Variables are plotted per category for dimension 1 and dimension 2. Firstly, the plot of Belgium Government responsibility seems to have a horseshoe shape.The horseshoe shape indicates the first dimension is the most dominant and that the second dimension is a quadratic function of the first dimension. This indicates a strong first dimension, so the variation is mainly accounted for in the first dimension. The second dimension discriminates extremes from middle position. Secondly, the plot of the New Zealand government responsibility is also shaped in the form of a horse shoe. This also indicate a strong first dimension, which indicates that the variation is mainly accounted for in the first dimension.

Help_industries.4 Provide_jobs.4 Help_unemployed.4 Reduce_gap.4

−0.05

0.00

0.05

0.04 0.02 −0.02 0.00

Dimension 2

0.06

Reduce_gap.2 Provide_housing.2 Help_unemployed.2 Provide_jobs.2 Promote_genderequality.2 Support_students.2 Help_industries.2 Control_prices.2 Reduce_industrypollution.2 Provide_jobs.3 Reduce_industrypollution.1 Help_unemployed.3 Reduce_gap.1 Help_industries.3 Promote_genderequality.1 Control_prices.1 Support_students.1 Provide_jobs.1 Reduce_gap.3 Provide_housing.1 Help_industries.1 Help_unemployed.1 Control_prices.3 Promote_genderequality.3 Reduce_industrypollution.3 Provide_housing.3 Support_students.3

−0.04

−0.02

Centroids Government Responsibility New Zealand rating per category, dim 1 and 2

−0.06

Dimension 2

0.00

0.02

Centroids Government Responsibility Belgium rating per category, dim 1 and 2

Help_unemployed.4 Control_prices.4 Reduce_gap.4 Provide_jobs.4 Provide_housing.3 Reduce_industrypollution.3 Help_unemployed.1 Provide_jobs.1 Support_students.3 Promote_genderequality.3 Help_industries.1 Control_prices.1 Provide_housing.1 Help_industries.3Support_students.1 Reduce_gap.1 Control_prices.3 Promote_genderequality.1 Reduce_industrypollution.1 Help_unemployed.3 Reduce_gap.3 Reduce_industrypollution.2 Help_industries.2 Control_prices.2 Provide_jobs.2 Provide_jobs.3 Promote_genderequality.2 Support_students.2 Help_unemployed.2 Provide_housing.2 Reduce_gap.2

−0.05

Dimension 1

0.00

0.05

0.10

Dimension 1

The loading plots in the appendix as a visual interpretation of the fit per variable. Here, it can also be seen that most variables are associated with the first dimension for both cases. In table 3 and 4 in the

4

appendix the discrimination measures of Belgium and New Zealand is presented. For both countries the first dimension seems to explain the largest part of the variance. For Belgium all the objects are more explained by dimension one. For example take control prices explained for 26,9% by dimension one and only 15% by dimension two. Subsequently, for New Zealand the discrimination measures for all objects are also explained mainly by dimension one. Take the variable reduce gap for example, this variable is for 50.3 % described by dimension one whereas the second dimension only describes the variance of this variable for 31.9 %. The results of Belgium are printed in the histogram provided in the figure below. The confidence level with an alpha of 5% shows that the first four dimensions explain between 0.43 and 0.59 percent variance of the data. The observed discrimination measure mean is 0.51 and falls within the calculated confidence interval. These results indicate that the null-hypothesis cannot be rejected. There is enough evidence to assume that the fit of the variable of the first four dimensions is explained by the data. The bootstrap on the data of New Zealand is also performed under a 95% confidence interval. The first four dimension explain between 0.50 and 0.61 percent of the data with a mean of 0.50 which falls within the calculated confidence interval. These results indicate that the null-hypothesis cannot be rejected. There is enough evidence to assume that the fit of the variable of the first four dimensions is explained by the data. Bootstrap Confidence Interval Belgium

Bootstrap Confidence Interval New Zealand

400

Frequency

Frequency

300

300

200

200

100 100

0

0

0.40

0.45

0.50

0.55

0.60

0.40

Discrimination Measure for Dimension 1

0.45

0.50

0.55

0.60

0.65

Discrimination Measure for Dimension 1

5. Conclusion The research question of this paper was: ‘What are the main differences among Belgium’s society believes and New Zealand society believes regarding the government’s responsibility?’ The first dimension indicates the rate of government interference. In this case for Belgium the left side indicates more government interference whereas the right side indicates less government interference. The clusters indicate that individuals who are less fond of government interference for one category seem to find this for all categories. In Belgium an individual finds that the government should not take responsibility in reducing the poverty gap, individuals also finds that the government should not interfere in helping the unemployed people in society. The First dimension in the second plot also indicate the rate of government interference. For New Zealand, the left side of the plot shows less government interference and the ride side more government interference. According to the clusters it seems that individuals who find that less responsibility for unemployment help by the government must be given also find that the government must take less responsibility for reducing the poverty gap. When comparing both the countries the opinions from individuals who are not fond of government interference seems to be the same among these two countries. These homogeneous clusters seem to be the same for both countries. This might be explained by the similarity of the countries in terms of development and prosperity. Therefore, the conclusion is that often when individuals find that the government should not interfere in one category this should not be done for all category. This indicates that the government in both countries should not focus much on reducing the poverty gap and helping the unemployed but rather on the other variables when allocating their resources. For future research, it would be nice to include more background information on the survey respondents. In this way the government can adjust their responsibility towards certain groups in society.

5

6. Code library(homals) library(boot) library(ggplot2) library(RColorBrewer) library(plot3D) library(plotrix) library(ISLR) library(FactoMineR) library(factoextra) library(tidyverse) library(factoextra) library(knitr) load("C:/Users/lotte/Downloads/Belgium_GovResponsibility.Rdata") load("C:/Users/lotte/Downloads/NewZealand_GovResponsibility.Rdata") source("my_plot.homals.R") source("my_rescale.homals.R") #Manipulate data BE_GovResponsibility