Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ibtissam El Fajri Math 1040 Elementary Statistics Final Project In this course I was required to perform a statistical analysis project where I proposed a hypothesis and tested it to find out if there is a correlation between smoking and forced Expiratory Volume in Liter. This study aimed at assessing children's pulmonary function in the absence or presence of smoking cigarettes, as well as exposure to passive smoke from at least one parent. Below are some of the steps taken to achieve this: Randomly collected data Charted all my data in various graphs to visually describe my data Concluded with a statistical analysis description of the data stating wither it supported my hypothesis or not Part 1, 2, 3: Categorical and quantitative variable the entire population: Quantitative Categorical Accumulative Data Categorical Forced Expiratory Volume -Liter Smoke 0=N 1=Y Do they Smoke? Frequency Percentage Mean 2.637284404 Mean 0.194189602 No Yes 527 127 0.805810398 0.194189602 Standard Deviation Standard Deviation Total 654 1 0.866589461 0.395878306 Pie chart for the Categorical Variable Smoker and non-smoker Do They Smoke? 19% smokers 81% No Yes We decided as a group to choose simple random method and systematic methods: 1. Categorical Simple Random Sample Categorical Simple Random Smoke 0=N 1=Y Mean 0.219 Standard Deviation Accumulative Data Categorical Do they Smoke? No Yes Total Frequency 25 7 32 Percentage 0.78125 0.21875 1 0.420 Categorical Simple Random Sampling Categorical Simple Random Sampling 30 Non-Smoker Smoker 25 20 15 Non-Smoker 10 Smoker 22% n=32 5 78% 0 Frequency Categorical Simple Random Sample 30 25 20 15 10 5 0 Non-Smoker Smoker From comparing the different results for the simple random Sample to the population, we noticed that the mean and the standard deviation of the simple random sample were slightly greater than the mean and standard deviation for the whole population. There is also slight difference on the percentage of the smokers and nonsmokers comparing to the entire population. 2. Categorical Systematic Random Sample Categorical Systematic Random Smoke 0=N 1=Y Mean 0.129 Standard Deviation Accumulative Data Categorical Do they Smoke? Frequency Percentage No 27 0.870967742 Yes 4 0.129032258 Total 31 1 0.341 Categorical Systematic Random Sampling Categorical Systematic Random 30 25 Non-Smoker 20 15 Non-Smoker 10 Smoker Smoker 13% n= 5 87% 0 Frequency Categorical Systematic Random Sample 5 4 4 3 3 2 2 1 1 0 Non-Smoker Smoker In the systematic sampling, we noticed that that results are almost equivalent to simple random sample and also the entire population. Quantitative Data Quantitative Quantitative Simple Random Forced Expiratory Volume -Liter Mean 2.4164 Standard Deviation Forced Expiratory Volume -Liter Mean 2.637284404 Standard Deviation 0.866589461 0.840221178 1-Simple Random Sampling Quantitative Simple Random Sampling Frequency 15 10 5 0 One Two Three Four Forced Expiratory Volume (L) Type Figure 1. Quantitative Simple Random Sampling Histogram (n=30) Box Plot -Simple Radom Median = Quartile1 = Quartile3 = Max = Min = 1.253 0 1 2 3 4 5 2-Systematic Random Sampling Quantitative Systematic Random Forced Expiratory Volume -Liter Mean 2.440096774 Standard Deviation 0.750955762 Quantitative Systematic Random 14 12 Frequency 10 8 6 4 2 0 One Two Three Four Forced Expiratory Volume (L) Type Figure 2 Quantitative Simple Random Sampling Histogram (n=31) Box Plot- Systematic Radom Quartile3 = Quartile1 = Min = 1.338 Max = 4.299 Median = 2.52 0 2 4 6 8 This distribution is not symmetric: the tail in the positive direction extends further than the tail in the negative direction it is a positive skewed frequency for both the systematic and simple random selection. In the random simple sample, the selected people had a higher frequency of FEV between 1 and 2 in of the sample selected however in the systematic the highest frequencies are 1, 2 and 3. Part 4: Categorical Variable Confidence Interval: 1-We first verified the necessary requirements are satisfied. The study results in samples that are considered to be simple random samples. 2- The conditions for a binomial experiment are satisfied because there is a fixed number of trials 654 participants and each trial is independent from the others. 3-They are 2 categories of outcome smoking or non-smoking with the same probability of happening 4- With 22% of smokers we choose from our sample and 78% of non-smokers we didn’t choose and the whole population n=654. We verified np> 5 and nq>5 We started by calculating the Margin of error E = 0.12 and then we were able to construct the confidence Interval: 0.12<p<0.339 which means if we want 90% confidence interval for the true population percentage, we could express the result as 12 %< P<30%. To conclude, we are 90% confident that percentage of smokers we selected, we conclude that the smoker percentage is less than 50%. Systematic Variable Confidence Interval: We followed the same steps above and found 0.1290<p<0.870 We are 90% confident that we will find 12% of smokers in our population. Confidence interval for estimating a population Mean with standard deviation known 1- We first checked the requirements: the sample is simple random sample 2- The value of the population standard deviation is known 3- Either or both of these conditions is satisfied : the population is normally distrusted or n>30 We calculated E = 0.248 The sample mean-E< the population Mean< the sample mean+E 2.1681<the population Mean<2.6646 We are 90% confident that the interval from 2.1681 to 2.6646 actually does contain the true value of the mean of the population. This means that if we were to construct the corresponding confidence intervals, in the same size and construct the corresponding confidence intervals, in the long run 90% of them would actually contain the value of the mean of the population. H0: P= 0.954128 is true (numbers we choose) Alternative Hypothesis H1 P< 0.954128 By calculating the test for the population mean we found p< 0.954128 which supports the alternative H1