Download Statistic exam 2013/2014 Statistic exam 2013/2014 Statistic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Statistic exam 2013/2014
Statistic Synopsis – International Business & Politics 2013
Peter Dalgaard and Carmine Gioia
Esben Linnet Burkandt
Mads Saabye Jørgensen
Valdemar Gaarn Rasmussen
Sidsel Green Pedersen
Sheryne Hafez
xxxxxx-xxxx
xxxxxx-xxxx
xxxxxx-xxxx
xxxxxx-xxxx
xxxxxx-xxxx
Statistic exam 2013/2014
Question 1.
Give a brief description of the variables in the data set. Pay particular attention to the logSize
variable. Notice that this variable, originally given in square feet, has been transformed to logarithmic
scale (base 10 logarithm). Report the minimum and maximum store size in square feet and square
meter.
A variable is any characteristics that are recorded for a study. For the data set obtained from Enterprise
Surveys (http://enterprisesurveys.org), The World Bank. The variables are store type, city, size of store
in sq. feet (given by log-10 scale), level of competition on a scale ranging 1-4, perception, efficiency 3
years ago (given by 10-log scale), efficiency today (given by 10-log scale), total sales 3 years ago
(given by 10-log scale), total sales today (given by 10-log scale) and whether they have a computer or
not.
The variables store type, city, perception and computer are each observations, which belong to a
category. Thus they are categorical variables. The variables, store size, competition, efficiency 3 years
ago, efficiency today, total sales 3 years ago and total sales today are numerical, and thus we can
describe them as quantitative variables.
Efficiency today and efficiency 3 years ago are dependent variables, as each sample has a paired
sample. The same goes for total sales 3 years ago and total sales today.
The two research questions are “Competition and labor productivity in India’s retail stores” and “Are
labor regulations driving computer usage in India’s retail stores?”.In the research paper the main
explanatory variable is competition, and the main response variable is labor productivity. In the second
question labor regulations is the main explanatory variable, and computer usage is the main response
variable.
Given the LogSize (size of store in square feet in log 10 scale) we report the minimum and maximum
values in square feet by: 10^LogSize.
Minimum square feet: 12
Maximum square feet: 14.997
Minimum square meters: 1,1
Maximum square meters: 1.393
In square meters by: 14.997*0,0929 = 1.393 and 12*0,0929 = 1,1
Descriptive statistics for Computer
Computer is a categorical, binary variable and is
illustrated in a bar chart. We illustrate it as a
dummy variable, where 0 and 1 respectively
describe the absence or presence of a computer
in the retail store. Analyzing the bar chart we
see that 86 % of the sample participants say no
to having a computer, whereas 14 % say yes.
1
Statistic exam 2013/2014
Computer
usage
Count
Probability
No (0)
340
0,85642
Yes (1)
57
0,14358
Total
397
1,0000
Descriptive statistics for LogSize (store size)
As a quantitative numerical variable we can describe store size with a histogram. As seen in the
histogram the distribution of store size seems to be normal distributed with a mean store size of
2,1982 Log 10 scale 157,8 square feet and a median of 2,176 log 10 scale 150 square feet. With a
difference between the mean and the median of (157,8-150 = 7,8) square feet there is a close match
between the two, meaning that there is an approximately symmetric distribution. Thus it suggest that
the distribution of store type is normal.
Question 2. Compare Efficiency between the store types. First make a two-group comparison between
the two largest groups (Traditional and Consumer Durable)3 and then a comparison between all three
groups.
Efficiency is the quantitative response variable and store type is the categorical, explanatory variable.
First we will make a significance test for whether efficiency is dependent on store type and then we
will produce a 95 % confidence interval for the difference between the two efficiency means.
Two-sided significance test
We will now perform a double-sided significance test for whether efficiency is dependent on the store
type.
1) Assumptions:
- Quantitative response variable for two groups
- Independent, random samples
- We assume approximately normal distribution for each group, given the central limit theorem, as we
have a large sample size
2) Hypothesis:
The null hypothesis: Efficiency is not dependent on store type, thus there will be no difference
between the efficiency-means
H0:(u1-u2)=0
The alternative hypothesis: Efficiency does depend on store type, thus there will be a difference
between the two groups' means
Ha:(u1-u2)0
2
Statistic exam 2013/2014
3) Test-statistics
4) P-value = 0,001
This tells us the probability that the test statistic equals the observed test statistics or a value even more
extreme. To get the p-value we use the t-test together with degrees of freedom = 103.
5) Since the p-value is below the significance level =0.05we can reject the null hypothesis, supporting
the claim that there is a difference between efficiency within the two store types.
We can thus further investigate the precise difference in efficiency through a 95 % confidence interval
regarding the difference between the means.
Assumptions for a 95% confidence interval
- Independent random samples
- Quantitative response variable for two groups
- Approximately normal distribution for both groups (given by the central limit theorem)
Confidence interval
The formula for the 95 % confidence interval for the difference between two population proportions is
(traditional Fmcg and consumer durable stores)
(𝑥1 − 𝑥2 )𝑡.025 (𝑠𝑒) =
The value of 𝑡.025 is reported by a t table to be 1,960 with DF of 103.
By computation the above confidence interval does not contain zero, why we can state that we can be
95% confident that Traditional Fmcg stores are between -0,27553 and -0,51811 less efficient than
Consumer durable stores. (for further summary statistics see appendix, question 2)
We extend our analysis to compare all three groups of store types. This is done by one way ANOVA
test since we are measuring only one factor that can impact efficiency, which is store type.
ANOVA - comparing several means
1. Assumptions
- The population distributions of the response variable for the groups are normal, with the same
standard deviations for each group.
- Randomization: in a survey sample, independent random samples are selected from each of the g
populations.
- For an experiment, subjects are randomly assigned separately to the g groups.
3
Statistic exam 2013/2014
2. Hypothesis
The null hypothesis states that each population mean within the three store types are equal (traditional
Fmcg, consumer durable stores and modern format stores.)
Our alternative hypothesis states that at least two of the population means are unequal. In this case
either traditional Fmcg, Consumer durable stores or modern format stores.
Ha=At least two of our population means are different.
3. Test statistic
= 29,23
The F sampling distribution has DF1 = G-1 = 3-1 = 2 and DF2 = N-g = 389-3 = 386.
4. P-value = 0,0001
5. Conclusion
By the above computation done in SAS JMP the F ratio is calculated by the ratio of the two mean
squares = 5,93626/0,20309 = F=29,2303. This F ratio reports a P value = <,0001. Since we have a P –
value that is smaller than our significance level of 0,05, there is strong evidence against the null
hypothesis.
We found a small p-value in our F test, but the test does not specify which means are different or how
different they are. Thereby we estimate confidence intervals comparing pairs of means for all tree store
types.
For two groups e.g. traditional fmcg and modern format stores, with sample means y1and y2having
sample sizes n1 and n2, the 95% confidence interval computed by SASJMP is calculated by the
following formula: 𝑦1
− 𝑦2 ± 𝑡.025 ∗ 𝑠√
1
𝑛1
+
1
𝑛2
We infer that the efficiency for Consumer Durable Stores is between 0,2769 and 0,5166 higher than
the efficiency in Traditional Fmcg stores. Since the confidence interval contains only positive
numbers, this suggest that
> 0.
Further the comparison between Modern Format Stores and Traditional Fmcg shows that the
efficiency for Modern Format Stores is between 0,2251 and 0,5155 higher than the efficiency in
Traditional Fmcg Stores.
4
Statistic exam 2013/2014
In the last comparison between Consumer Durable Stores and Modern Format Stores we have a
confidence interval between -0,1461 and 0,1991. Because the confidence interval contains 0, there is
not enough evidence to conclude that a difference exists. According to our alternative hypothesis,
which states that at least two of our population means are different, we can conclude that Traditional
Fmcg’s mean is significantly different.
Question 3. Similarly, compare the probability of computer use as recorded in the Computer variable.
Again, both do a two-group comparison and the full three-group comparison. (store types)
Two group comparison - Traditional and Consumer Durable:
When comparing the proportion of computer usage for different store types, the explanatory variable is
store type, and the response variable is computer usage. Now the response variable is categorical as
opposed to question 2. First we will do a double-sided significance test to find out whether there is a
difference in computer usage across the two store types, then we will calculate a confidence interval to
predict, with 95% confidence, how big the difference is. For the full three-group comparison we will
make a Chi-square test.
TWO-SIDED SIGNIFICANCE TEST
A two-sided significance test is done through five steps.
Assumption:
- A categorical response variable for two groups
- Independent random samples
- n1 and n2 are large enough that there are at least five successes and five failures in each group
Hypothesis:
meaning that
The null hypothesis suggests that computer usage across store types is alike.
The alternative hypothesis suggests that computer usage differs according to store type.
Test Statistic:
is the pooled estimate, which is the total sum of stores using computers in relation to all stores.
P-Value:
P=0.0001
Conclusion:
Our P-value obtained is far below the significance level of 0,05 giving strong evidence against the null
hypothesis. On the other hand this supports the alternative hypothesis, so we can conclude that there is
an association between computer usage and store type.
CONFIDENCE INTERVAL
5
Statistic exam 2013/2014
A 95 % confidence interval, for the difference in computer usage between two population proportions
(Traditional- and consumer durable stores), is calculated as follows:
Where z=1.96
We use 𝑃̂ instead of P, as we are dealing with a sample proportion and thus use predicted population
proportion.
Assumptions:
- A categorical response variable for two groups
- Independent random samples for the two groups, either from random sampling or a randomized
experiment.
- Large enough samples that there are at lease 10 successes, and 10 failures.
Calculating the 95 % confidence interval:
The upper and lower 95 % CI has been computed through software, due to the higher precision of
those calculations.
Conclusion
Looking at the upper and lower case confidence interval, it can be inferred that computer usage is
approximately between 12,69 % and 33,68 % higher for consumer durable stores than for traditional
stores.
Chi-squared test statistics:
The chi-squared test statistics compares the observed cell counts to the expected cell counts, testing the
independence of two conditional distributions. The test compares the cell counts in the contingency
table with counts we would expect to see if the null hypothesis of independence were true. The chisquared test will show the three-group comparison between all three store types Traditional, Consumer
Durable and Modern format stores. It is also shown in five steps:
Expected cell count:(row total)*(column total) total sample
1) Assumptions:
- Categorical response variable for three groups
- Independent random samples
- Last enough sample sizes, so there is at least 5 “successes” and 5 “failures” in each group
2) Hypothesis:
H0=Store type and computer is independent
Ha= Store type and computer is dependent
3) Chi-squared test statistics:
6
Statistic exam 2013/2014
𝑥2 = ∑
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)2
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡
DF = (r-1)(c-1)
DF = (3-1)(2-1) = 2
Test
Chi Square Prob>ChiSq
Likelihood Ratio 94,394
<0,001
Pearson
<0,001
116,147
4) P-value
With DF = 1 we get a chi-square equal to 116,147. Thus we get a p-value of <0,001
5) Conclusion: with a P-value <0,001 we can reject the null hypothesis. Due to the small p-value, we
infer that the variables store type and computer is associated.
Strength of association:
Analyzing the strength of the association
makes us capable of determining
whether the association between store
type and computer is significant and
important, or significant and weak and
thus useless in our analysis.
Percentage proportion with computer:
Consumer durable stores: 26,76
Modern format stores: 62,79
Traditional stores: 3,89
Conclusion:
As none of the proportions are
numerically close to each other, they
will all result in a value far from 0 when
measuring their proportion.
Thus we can conclude there is a strong,
significant association between store
type and computer.
Question 4. Compare the variables Efficiency and Efficiency3yr. Has there been a significant increase
in efficiency? Give a confidence interval for the average increase.
The variables efficiency and efficiency3yr are dependent samples, meaning each observation in one
sample has a matched observation in the other sample. For dependent samples, mean of difference =
difference of means, thereby the difference (𝑥̅! − 𝑥̅2 ) between the means of the two samples equals the
mean 𝑥̅𝑑 of the difference scores. We thus construct a new variable, d, which illustrates the difference
between the two sample means:
𝑑 = 𝑥̅𝑑 = 𝑥̅1 − 𝑥̅2 = 0,7230655
𝑠𝑒 =
𝑠𝑑
√𝑛
=
1,8101355
√397
= 0,91848
7
Statistic exam 2013/2014
To compare means with dependent samples, we construct a confidence interval and do a two-sided
significance test using the single sample of difference scores.
The 95% confidence interval
single sample.
and the test statistic
are the same as for a
Two-sided significance test:
Assumptions: 1) Quantitative response variable for two groups 2) Independent random samples
3) Approximately normal distribution for each group (mostly applicable for small sample sizes)
2)
Hypothesis:
Null hypothesis: There is no difference between efficiency today and efficiency 3 years ago
H0:x1-x2=0
Alternative hypothesis: The stores are more efficient today than three years ago
Ha:x1-x20
3) Test statistics:
𝑡=
𝑥̅𝑑 −0
𝑆𝑑
= 7,9591
√𝑛
4) P-value = 0,0001 as reported from our test statistics in JMP
5) Conclusion:
The P-value tells us there is there is a very little probability that we will observe a t-value of 7,9591 or
more extreme in our population. Thus we can reject the null hypothesis, and confirm that there has
been a significant increase in efficiency.
Computation of 95% confidence interval
8
Statistic exam 2013/2014
The above 95% confidence interval is reported by SAS JMP. However it is calculated by the following
formula:
Sample mean difference ±𝑡.025 (𝑠𝑒) = 0,7230655 ± 1,960 ∗ (0,0908481) = 0,545003; 0,901128)
The critical value t is found by looking in the t table. with DF = N-1 = 397-1 = 396 = 1,960. the se is
reported in SAS JMP and calculated by the following formula:
𝑠𝑒 =
𝑠𝑑
√𝑛
=
1,8101355
√397
= 0,91848.
We use the values reported by SAS JMP, since they are calculated more precise.
Conclusively we can be 95% confident that the efficiency has increased by between 0,544461 and
0,90167 units within the last 3 years.
Question 5. Fit a simple linear regression in which Efficiency is described by logSize. Compute a 95%
confidence interval for the slope of the regression and interpret the result. Discuss possible violations
of the model assumptions. See appendix 5
We fit a simple linear regression where Efficiency is the response variable and LogSize the
explanatory. r2= 0,11 (11%) This value shows that the model has 11% less error than ȳ in predicting
efficiency.
1. Assumptions:
1)The population means of y at different values of x have a straight-line relationship with x, that is 𝑦̂ =
𝑎𝑥 + 𝑏
2) The data are gathered using randomization, such as random sampling or a randomized experiment.
3) The population values of y at each value of x follow a normal distribution, with the same standard
deviation at each x value.
The assumptions of randomization is described in the report from the World Bank, and the normal
distribution for relatively LogSize and Efficiency are shown in Question 1 and 4.
The first assumption regarding linearity can be questioned, as r=r2=0,11 =0,34. 0,34 does not display a
strong correlation. The simple linear regression model in this case is thus not particularly good, we
will however use it as part of our analysis. Later on we will strengthen the model by including more
variables in a multiple regression analysis, as seen in question 6.
2. Hypotheses:
𝐻0 : 𝑏 = 0
𝐻𝑎 : 𝑏 ≠ 0
9
Statistic exam 2013/2014
3. Test statistic:
Reported from SAS JMP we can calculate our t score using the following formular with our b
coefficient of 0,3390092 and se of 0,048042.
4. P-value= 0,0001.
5. Conclusion:
From our test statistic, JMP reports a p-value of 0,0001. Thus we can reject our H0, and state that there
is a relationship between logSize and efficiency.
Computation of 95% confidence interval.
Looking up in the t table it reports a critical value of 1,960 with DF = 387 and a confidence level of
95%. We can construct a 95% confidence interval by the following formula
0,33900921 ± 1,960 ∗ 0,48042 = (0,2445541, 0,4334643)
=
Thus by the above computations we can be 95% confident that the slope falls between the above
confidence interval. On average the efficiency increases by between 0,2445541 and 0,4334643 for
each additional 1unit increase in LogSize (each additional 10 square feet increase).
Question 6. Extend the linear regression to a multiple linear regression by further including
StoreType, City, and Competition.
There are several explanatory variables such as store type, city, competition and store size that have an
impact on the store’s efficiency. We will combine these variables in a multiple regression model,
where the idea is that more than one explanatory predicts the response variable.
Our R2 increases from 0,114 to 0,260 when we add the extra variables. This implies a greater
reduction in error when predicting y by x, instead of only using y, than when we just looked at store
size in the above question.
(a) Fit an additive model to data and explain the most important parts of the output.
10
Statistic exam 2013/2014
By looking at the effect test reported by SASJMP, we see that the overall p-value for the city variable
is = 0,5197, which is above our significance level of 0,05. We will exclude the city variable in our
analysis, since there is no statistical significance for inclusion. By excluding city, our R2 will decrease
from 0,26 to 0,25. We consider this reduction insignificant. (see appendix, question 6 for summary
statistics)
The multiple regression equation is set up as
𝜇𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 , 𝑒𝑡𝑐.
SASJMP reports the following values:
The y-intercept, α = 3,22611011872101. By including all variables, the multiple Regression
equation=
X1= competition, X2=l ogSize, X3= consumer durable stores, X4= Modern format and X5=
Traditional Fmcg.
Our β values describe what happens given a certain value of the particular x.
From the output we can look at the fit of the model, basing our analysis on the given squared
correlation, 𝑅2 , which is indicated as 0,25. This means that the multiple regression equation has
26% less error than ȳ.
(b) Check the model assumptions. Pay attention to possibly nonlinear effects and interactions.
Extend the model if required.
Assumptions of the model:
1) Each explanatory variable has a straight-line relation with 𝜇, with the same slope for all
combinations of values of other predictors in the model
2) Data gathered with randomization
3) Normal distribution for y with the same standard deviation at each combination of values of other
predictors in the model
1) Linearity:
We check for nonlinear effects by plotting the residuals against the different explanatory variables.
11
Statistic exam 2013/2014
When looking at Competition and Store Type
we see that neither of these, when squared,
show any signs of ‘banana’-shape, and there is
neither any obvious change in variation as the
x-value increases.
LogSize does display some change in
variation, as it appears the variance is larger
for small and large values of x.
This does not invalidate the use of multiple
regression. We must however be critical
towards inferences of efficiency based of
LogSize.
2) Data gathered with randomization:
The data are a cross section of 1948 stores spread over 16 states and 41 cities of India. It is collected
by the World Bank, and we assume they have considered randomization in order for a statistical
analysis to be valid.
3) Normal distribution:
As can be seen on the graph of Bivariate fit of studentized residuals, residuals are normally distributed
and have a constant standard deviation.
The residuals fall within 1 standard deviation.
Check for interaction (see appendix, question 6):
In order for our multiple regression analysis to be valid we want to check that there is no interaction
between the residuals. This means that the effect of either factor on the response variable is the same at
each category of the other factor.
12
Statistic exam 2013/2014
The graph (check appendix) displays an interaction between LogSize and Storetype. The effect of
Competition on Efficiency however seems to be independent of both Store Type and LogSize. This
can be confirmed by our P-value for StoreType*logSize, which is <0,001 (low) and thus displays a
high evidence of interaction between the two variables.
Lurking variables for efficiency could be Perception or Computer. However, when looking at the
effect test in JMP they both report a P-value above our significance level of 0,05. This implies we
cannot make inferences about efficiency due to neither Computer nor Perception, and we will thus not
extend our model any further.
(c) Discuss the statistical significance of the predictors, and state a 95% confidence
interval for the effect of Competition.
2. Hypotheses: the null hypothesis
. Since there is no prior prediction about whether the
effect of competition is positive or negative (for fixed values of x2 and x3), we use the two-sided
significance test, and the alternative hypothesis:
.
3. Test statistics: Our parameter estimates calculated in SAS jmp reports a slope estimate of 0.5604006
for competition and a standard error of se = 0.095327. It also reports the t test statistics of
4. P-value: Our parameter estimates calculated in SAS jmp reports a P-value = 0.0001. This is the twotailed probability of a t statistic above 5.88 and below -5.88, if
were true.
13
Statistic exam 2013/2014
5. Conclusion: The P-value of 0.0001 makes it possible for us to reject our null hypothesis that
. At common significance levels of 0.05, we can reject
significant effect on efficiency.
. Competition does have a
By the above significance test we cannot tell whether the null hypothesis is plausible. We thereby do a
confidence interval to show the precise effect competition have on efficiency.
We consider the multiple regression analysis of y = efficiency and predictors x1 = competition, x2 =
logsize, x3 = Store type. We now find and interpret a 95% confidence interval for b1, the effect of
competition while controlling for logsize and storetype.
From the data calculated in SAS jmp we look at the parameter estimates. b1 = 0.5604006, with se =
0.095327. The confidence interval equals:
At fixed values of x2 and x3, we infer with 95 % confidence, that the level of efficiency increases
between 0,37 and 0,75 each time we increase competition by 1 unit.
Question 7. Fit a logistic regression model predicting Computer from logSize. Compute the odds ratio
corresponding to a 10-fold increase in size, and give a 95% confidence intervals for the odds ratio.
JMP computes the following logistic
regression model for predicting computer
from logSize:
From the logistic regression model it is shown
that as the logSize of a store increases so does
the chance of computer usage. To more
precisely predict what the odds ratio is of
computer usage with a 10-fold increase in
logSize we need a starting point. Therefore we
must first calculate the probability of having a
computer for an average sized store. Further,
we need to know & . JMP provides these
numbers.
14
Statistic exam 2013/2014
Mean logSize/ x = 2.198
=-9.83
= 3.35
Computer is a binary response variable allowing us to use the following equation to predict the
probability of computer usage for an averaged size store.
ln(1 − 𝑝𝑝 ) =∝ +𝛽𝑥  p = +1 + 𝑒 ∝+𝛽𝑥𝑒
𝑃1 =
𝑒 −9,83+3,35∗2.198
1+𝑒 −9,38+3,35∗2.198
∝+𝛽𝑥
= 0,078 = 7,8% chance of computer usage for an averaged sized store
To calculate the chance of computer usage for a 10-fold increase in logSize we must follow the same
procedure. A 10-fold increase in logSize is equal to 2.198+1=3.198, due to the base 10 logarithm.
𝑃2 =
𝑒 −9,83+3,35∗3.198
1+𝑒 −9,83+3,35∗3.198
= 0,0707 = 70,7% chance of computer usage for a 10-fold increase
in logSize
To infer further on these results we can calculate the relative odds (odds ratio). This will enable us to
see the chance of having a computer in a store with logSize 3.198 compared to a store with a logSize
of 2.198.
From this it can be concluded that the chance of using a computer is 28.52 bigger for a store with a
logSize 3.198 than for a store with logSize 2.198. This proves that there is an association between the
size of a store and whether or not it uses computers.
The 95% confidence interval for the odds ratio ranges from 12.64 to 70.99. Hence, there is between
12.64% and 70.99% higher odds of having a computer in a store with logSize 3.198. This confirms
with 95% confidence, what has already been proved, that there is a positive association between
logSize and computer usage due to the interval being above and not containing 0.
(b) Extend the model to a multiple logistic regression using the predictors logSize, City,
StoreType, and Perception.
15
Statistic exam 2013/2014
According to the parameter estimates reported by SASJMP we can exclude the city variable, due to its
high p-value, which make it statistically insignificant. Our model is now reduced, and consists only of
the remaining significant variables with a p-value below 0.05. These are LogSize, StoreType and
Perception which are statistically significant in predicting computer use. The excluded computation by
SASJMP is reported below.
Knowing the significant explanatory variables we can now report the prediction equation for the
response variable computer usage with x1 being logSize, x2 being perception, x3 being Consumer
Durable Stores, x4 being Modern Format Stores and x5 being Traditional Fmcg Stores. The equation
will state the probability of computer usage according to the different values of x chosen for the
explanatory variables.
(c) Describe the effect of Perception, both in terms of statistical significance and
in real-world terms (i.e., if there is an effect, what does it mean?).
We already described that Perception is statistical significant in question b. This was done when we
left it as a significant explanatory variable due to its p-value < 0.05. Specifically the Perception pvalue = 0,0115 tells us that there is less than 1,15% chance of observing a value outside 1.96 standard
deviations from the mean, proving an association. This can be proven further. Looking at the data
below there is an obvious tendency that as Perception grows the proportion of Computer Usage
decrease.
16
Statistic exam 2013/2014
This association should be understood,
in real-world terms, as follows. Stores
perceiving labor regulations as a
problem, tends to substitute labor for
computers in order to avoid the issues
related to labor. In the data we are given
the proportion of stores in each city that
regard labour regulations as a problem.
Naturally this means that in a city where
a large amount of stores view labor regulations as a problem the proportion of computer usage will be
higher than in the opposite case.
17