Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why should you study Statistics? to enhance your ability to comprehend statistical jargon presented by the media and retailers to enable you to apply basic statistical techniques to help you realize that statistics is an important tool that is needed in business and industry { estimate the percentage of defective light bulbs in a large warehouse based on testing only a small portion of the bulbs (Chapter 7) { \...lax quality control were basic causes" for the deaths of motorists in Ford Explorers with Firestone tires (Chapter 12) { predict the amount of sales for a company based on how much the company spent on advertising (Chapters 10, 11) Two broad applications of Statistics Example: Collect a set of data of the GPA of 25 sophomores sampled from those enrolled in Stat 227. Descriptive Statistics Compute sample statistics (i.e., mean, variance), construct histograms etc., and report these just for the 25 sophomores. Here numerical and graphical summaries are used just to describe the sample at hand. Inferential Statistics Compute sample statistics, construct histograms etc., and use statistical methods to make a general statement about all sophomores enrolled in 227 (even perhaps in other years). Here we use a sample to make statements (or draw conclusions) about the whole population. 2 1 A population is the set of all units of interest in a study. Example of populations: all employed workers in the U.S. all registered voters in CA. everyone who has purchased a domestic minivan last year. all year 2000 transactions at the Memorial Union ATM machine. Example: A variable is a characteristic of the units of a population. A variable is either A measurement is the process of assigning numbers to the variables of the individual units. If you had the resources, so that you can obtain the weights of all ISU students then you have a census. A census is obtained when every unit in a population is measured. However, in most studies, obtaining a census is too expensive, time consuming or just not reasonable. For example, it is not reasonable to measure the diameter of every ball-bearing of a 500,000 lot made in a day by a machine. quantitative - age or income of workers, - lifetime of bulbs, - rating given to the minivan, - value of each sale qualitative - gender, - political party, - state of residence Suppose that a researcher is interested in the percent of all ISU students who are overweight. a). Population: All ISU students (the units are the students) b). Variable: Weight. c). Measurement: Record weight in pounds using a scale. A possible solution is to take a sample of the population. 3 4 A sample is a subset of a population. How should you select a sample? Some ways of selecting a sample are better than others for making valid statistical statements about the population. Your goal must be to obtain a sample that is representative of the population. Example Weight of ISU students(continued) Methods for obtaining a sample: i). Weigh the oensive linemen of the football team. What's bad about the sample here? Sample not representative of ISU students Average sample weight will over-estimate the average population weight. ii). Weigh the rst 50 students who enter the library at 7:30 in the morning. Is this sample better than the rst? Yes, the sample is more representative of the student body, plus you measure more units in this subset. However, you restrict your sample to students who go to the library early mornings. iii). Randomly, select 50 students for weighing from the ocial enrollment list. Is this sample better than the others? Yes, every student has an equal chance of being selected. Hence, this sample is more representative of the population. The better sampling schemes have some form of randomness involved in the selection process. After the sample is obtained and measurements made, statistics and/or graphs are computed one can make a statement about the population. 5 Drawing conclusions and make decisions from data. A statistical inference is an estimate, prediction, or other generalization about a population based on information contained in a sample. Example(continued) After computing the average weight of students in the random sample, one can use ststistics to make a statement about the average weight of all ISU students. When one obtains a census, there is no inference involved and all averages are exact. However, when we obtain a sample and make inferences about a population the conclusions are not exact. Hence, we must attach a measure of reliability to any inference we make. A measure of reliability is a quantied statement about the degree of uncertainty associated with statistical inference. The average weight of the random sample of 50 students will not exactly equal the unknown average of the ISU student population. For example, we may be able to state the degree of certainty involved as: \the population average weight is within (150 10) lbs with a 1% chance of error." 7 6 Describing Data Graphical Methods Bar Graphs (Section 2.1) Histograms (Section 2.2) Scatterplots (Section 2.9) Time Series plot (Section 2.10) Numerical Summaries Measures of Central Tendency (Section 2.4) Mean Median Measures of Variability (Section 2.5) Range Variance Standard deviation Measures of Relative Standing (Section 2.7) z-score Percentile A measure of relative standing compares an individual object to the rest of the population. For example, if you got a the 75th percentile on the SAT standardized test, this means that you did better that 75% of the people that took the test. 8 Sample Statistics vs. Parameters A parameter is a descriptive statistic of a population. A sample statistic is a descriptive statistic of a sam- ple. Take a census of a population recording values of a variable as x1; x2; : : : ; xN , where N is the population size, then P Population mean, = Nx P Population variance, 2 = (xN;)2 p Population standard deviation, = 2 Since censuses are not taken for every population we wish to study, parameters cannot be exactly calculated and thus usually unknown. Take a sample of a population recording a values of a variable as x1; x2; : : : ; xn, where n is the sample size, recording a values as P sample mean, x = nx P sample variance, s2 = (nx;;1x)2 p sample standard deviation, s = s2 , 2, and are parameters. x, s2, and s are sample statistics. An Example Calculate the sample mean, variance, and standard deviation for data from Example 2.8 i 1 2 3 4 5 x 2 3 3 4 3 P x = 15 10 Page 1 of 1 Percentage of Revenues Spent on R & D Histogram Relative Frequency Quantiles 0.40 100.0% maximum 99.5% 0.30 97.5% 0.20 90.0% 0.10 quartile 75.0% median 50.0% quartile 25.0% 5.5 6.6 7.7 8.8 9.9 11 12.1 13.2 10.0% R & D percentage 2.5% 0.5% The distribution of the R & D percentages is slightly minimum skewed to the right. Note that the Mean lies to the right of 0.0% the Median. The standard deviation s is 1.98. In the text it is shown that approximately 94% of the data values are within 2 standard deviations of the mean. 11 Moments 13.500 13.500 13.500 11.280 9.625 8.050 7.050 6.500 5.310 5.200 5.200 8.492000 Mean 1.980604 Std Dev 0.280100 Std Err Mean upper 95% Mean 9.054880 7.929120 lower 95% Mean 50.000000 N 4 9 9 16 9 P x2 = 47 Compute sample variance s2 and sample standard deviation s. P x2 ; (P x)2 47 ; (15)2 2 n = 5 = 2=4 = 0:5 s = n;1 5;1 r p s = (s2) = 0:5 = 0:71 9 JMP Analysis of Example from Chapter 2 x2