Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Ibtissam El Fajri
Math 1040
Elementary Statistics Final Project
In this course I was required to perform a statistical analysis project where I proposed a hypothesis and tested it to find
out if there is a correlation between smoking and forced Expiratory Volume in Liter. This study aimed at assessing
children's pulmonary function in the absence or presence of smoking cigarettes, as well as exposure to passive smoke
from at least one parent. Below are some of the steps taken to achieve this:



Randomly collected data
Charted all my data in various graphs to visually describe my data
Concluded with a statistical analysis description of the data stating wither it supported my hypothesis or not
Part 1, 2, 3:
Categorical and quantitative variable the entire population:
Quantitative
Categorical
Accumulative Data Categorical
Forced Expiratory Volume -Liter
Smoke 0=N 1=Y
Do they
Smoke?
Frequency
Percentage
Mean
2.637284404
Mean
0.194189602
No
Yes
527
127
0.805810398
0.194189602
Standard Deviation
Standard
Deviation
Total
654
1
0.866589461
0.395878306
Pie chart for the Categorical Variable Smoker and non-smoker
Do They Smoke?
19%
smokers
81%
No
Yes
We decided as a group to choose simple random method and systematic methods:
1. Categorical Simple Random Sample
Categorical Simple
Random
Smoke 0=N 1=Y
Mean
0.219
Standard Deviation
Accumulative Data Categorical
Do they Smoke?
No
Yes
Total
Frequency
25
7
32
Percentage
0.78125
0.21875
1
0.420
Categorical Simple Random
Sampling
Categorical Simple
Random Sampling
30
Non-Smoker
Smoker
25
20
15
Non-Smoker
10
Smoker
22%
n=32
5
78%
0
Frequency
Categorical Simple Random Sample
30
25
20
15
10
5
0
Non-Smoker
Smoker
From comparing the different results for the simple random Sample to the population, we noticed that the mean and
the standard deviation of the simple random sample were slightly greater than the mean and standard deviation for the
whole population. There is also slight difference on the percentage of the smokers and nonsmokers comparing to the
entire population.
2. Categorical Systematic Random Sample
Categorical Systematic Random
Smoke 0=N 1=Y
Mean
0.129
Standard Deviation
Accumulative Data Categorical
Do they Smoke?
Frequency
Percentage
No
27
0.870967742
Yes
4
0.129032258
Total
31
1
0.341
Categorical Systematic
Random Sampling
Categorical Systematic
Random
30
25
Non-Smoker
20
15
Non-Smoker
10
Smoker
Smoker
13%
n=
5
87%
0
Frequency
Categorical Systematic Random
Sample
5
4
4
3
3
2
2
1
1
0
Non-Smoker
Smoker
In the systematic sampling, we noticed that that results are almost equivalent to simple random sample and also the
entire population.
Quantitative Data
Quantitative
Quantitative Simple Random
Forced Expiratory Volume -Liter
Mean
2.4164
Standard Deviation
Forced Expiratory Volume -Liter
Mean
2.637284404
Standard Deviation
0.866589461
0.840221178
1-Simple Random Sampling
Quantitative Simple Random
Sampling
Frequency
15
10
5
0
One
Two
Three
Four
Forced Expiratory Volume (L) Type
Figure 1. Quantitative Simple Random Sampling Histogram (n=30)
Box Plot -Simple Radom
Median =
Quartile1 =
Quartile3 =
Max =
Min = 1.253
0
1
2
3
4
5
2-Systematic Random Sampling
Quantitative Systematic Random
Forced Expiratory Volume -Liter
Mean
2.440096774
Standard Deviation
0.750955762
Quantitative Systematic Random
14
12
Frequency
10
8
6
4
2
0
One
Two
Three
Four
Forced Expiratory Volume (L) Type
Figure 2 Quantitative Simple Random Sampling Histogram (n=31)
Box Plot- Systematic Radom
Quartile3 =
Quartile1 =
Min = 1.338
Max = 4.299
Median = 2.52
0
2
4
6
8
This distribution is not symmetric: the tail in the positive direction extends further than the tail in the negative direction
it is a positive skewed frequency for both the systematic and simple random selection. In the random simple sample, the
selected people had a higher frequency of FEV between 1 and 2 in of the sample selected however in the systematic the
highest frequencies are 1, 2 and 3.
Part 4:
Categorical Variable Confidence Interval:
1-We first verified the necessary requirements are satisfied. The study results in samples that are considered to be
simple random samples.
2- The conditions for a binomial experiment are satisfied because there is a fixed number of trials 654 participants and
each trial is independent from the others.
3-They are 2 categories of outcome smoking or non-smoking with the same probability of happening
4- With 22% of smokers we choose from our sample and 78% of non-smokers we didn’t choose and the whole
population n=654. We verified np> 5 and nq>5
We started by calculating the Margin of error E = 0.12 and then we were able to construct the confidence Interval:
0.12<p<0.339 which means if we want 90% confidence interval for the true population percentage, we could express the
result as 12 %< P<30%. To conclude, we are 90% confident that percentage of smokers we selected, we conclude that
the smoker percentage is less than 50%.
Systematic Variable Confidence Interval:
We followed the same steps above and found 0.1290<p<0.870
We are 90% confident that we will find 12% of smokers in our population.
Confidence interval for estimating a population Mean with standard deviation known
1- We first checked the requirements: the sample is simple random sample
2- The value of the population standard deviation is known
3- Either or both of these conditions is satisfied : the population is normally distrusted or n>30
We calculated E = 0.248
The sample mean-E< the population Mean< the sample mean+E
2.1681<the population Mean<2.6646
We are 90% confident that the interval from 2.1681 to 2.6646 actually does contain the true value of the mean of the
population. This means that if we were to construct the corresponding confidence intervals, in the same size and
construct the corresponding confidence intervals, in the long run 90% of them would actually contain the value of the
mean of the population.
H0: P= 0.954128 is true (numbers we choose)
Alternative Hypothesis H1 P< 0.954128
By calculating the test for the population mean we found p< 0.954128 which supports the alternative H1