* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Review of Basic Statistics
Survey
Document related concepts
Transcript
Review of Basic Statistics Descriptive Statistics Review Measures of Location The Mean The Median The Mode Measures of Dispersion The variance The standard deviation The mean (or average) is Mean the basic measure of location or “central tendency” of the data. •The sample mean sample statistic. x is a •The population mean is a population statistic. Sample Mean xi x n Where the numerator is the sum of values of n observations, or: xi x1 x2 ... xn The Greek letter Σ is the summation sign Example: College Class Size We have the following sample of data for 5 college classes: 46 54 42 46 32 We use the notation x1, x2, x3, x4, and x5 to represent the number of students in each of the 5 classes: X1 = 46 x2 = 54 x3 = 42 x4 = 46 x5 = 32 Thus we have: xi x1 x2 x3 x4 x5 46 54 42 46 32 x 44 n 5 5 The average class size is 44 students Population Mean () number of observations xi The in the population is denoted by the upper case N. N The sample mean x is a point estimator of the population mean Median The median is the value in the middle when the data are arranged in ascending order (from smallest value to largest value). a. For an odd number of observations the median is the middle value. b. For an even number of observations the median is the average of the two middle values. The College Class Size example First, arrange the data in ascending order: 32 42 46 46 54 Notice than n = 5, an odd number. Thus the median is given by the middle value. 32 42 46 46 54 The median class size is 46 Median Starting Salary For a Sample of 12 Business School Graduates A college placement office has obtained the following data for 12 recent graduates: Graduate Starting Salary Graduate Starting Salary 1 2850 7 2890 2 2950 8 3130 3 3050 9 2940 4 2880 10 3325 5 2755 11 2920 6 2710 12 2880 First we arrange the data in ascending order 2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325 Notice that n = 12, an even number. Thus we take an average of the middle 2 observations: 2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325 Middle two values Thus 2890 2920 Median 2905 2 Mode The mode is the value that occurs with greatest frequency Soft Drink Example Soft Drink Frequency Coke Classic 19 Diet Coke 8 Dr. Pepper 5 Pepsi Cola 13 Sprite 5 Total 50 The mode is Coke Classic. A mean or median is meaningless of qualitative data Using Excel to Compute the Mean, Median, and Mode Enter the data into cells A1:B13 for the starting salary example. •To compute the mean, activate an empty cell and enter the following in the formula bar: =Average(b2:b13) and click the green checkmark. •To compute the median, activate an empty cell and enter the following in the formula bar: = Median(b2:b13) and click the green checkmark. •To compute the mode, activate an empty cell and enter the following in the formula bar: =Average(b2:b13) and click the green checkmark. The Starting Salary Example Mean Median Mode 2940 2905 2880 • Variance The variance is a measure of variability that uses all the data • The variance is based on the difference between each observation (xi) and the mean ( x ) for the sample and μ for the population). The variance is the average of the squared differences between the observations and the mean value For the population: For the sample: 2 ( x ) i 2 N 2 ( x x ) i s2 n 1 Standard Deviation • The Standard Deviation of a data set is the square root of the variance. • The standard deviation is measured in the same units as the data, making it easy to interpret. Computing a standard deviation For the population: For the sample: s ( xi ) 2 N ( xi x ) 2 n 1 Measures of Association Between two Variables •Covariance •Correlation coefficient Covariance • Covariance is a measure of linear association between variables. • Positive values indicate a positive correlation between variables. • Negative values indicate a negative correlation between variables. To compute a covariance for variables x and y xy ( xi x )( yi u y ) For populations N ( xi x )( yi y ) s xy n 1 For samples Mortgage Interest Rates and Monthly Home Sales, 1980-2004 17 Mortgage Interest Rate (Percent) n = 299 x 60.3 II 15 I 13 11 y 9.02 9 IV 7 III 5 3 15 35 55 75 95 Monthly Home Sales (thousands) 115 If the majority of the sample points are located in quadrants II and IV, you have a negative correlation between the variables— as we do in this case. Thus the covariance will have a negative sign. The (Pearson) Correlation Coefficient A covariance will tell you if 2 variables are positively or negatively correlated—but it will not tell you the degree of correlation. Moreover, the covariance is sensitive to the unit of measurement. The correlation coefficient does not suffer from these defects The (Pearson) Correlation Coefficient rxy s xy sx s y xy xy x y Note that: For populations For samples 1 xy 1 and 1 rxy 1 Distance Traveled in 5 Hours (Miles) Correlation Coefficient = 1 500 400 300 200 100 0 0 20 40 60 Average Speed (MPH) 80 100 I have 7 hours per week for exercise Time Spent Swimming (Hours) Correlation Coefficient = -1 8 7 6 5 4 3 2 1 0 0 2 4 6 Time Spent Jogging (Hours) 8 Normal Probability Distribution The normal distribution is by far the most important distribution for continuous random variables. It is widely used for making statistical inferences in both the natural and social sciences. Normal Probability Distribution It has been used in a wide variety of applications: Heights of people Scientific measurements Normal Probability Distribution It has been used in a wide variety of applications: Test scores Amounts of rainfall The Normal Distribution 1 ( x ) 2 / 2 2 f ( x) e 2 Where: μ is the mean σ is the standard deviation = 3.1459 e = 2.71828 Normal Probability Distribution Characteristics The distribution is symmetric, and is bell-shaped. x Normal Probability Distribution Characteristics The entire family of normal probability distributions is defined by its mean and its standard deviation . Standard Deviation Mean x Normal Probability Distribution Characteristics The highest point on the normal curve is at the mean, which is also the median and mode. x Normal Probability Distribution Characteristics The mean can be any numerical value: negative, zero, or positive. x -10 0 20 Normal Probability Distribution Characteristics The standard deviation determines the width of the curve: larger values result in wider, flatter curves. = 15 = 25 x Normal Probability Distribution Characteristics Probabilities for the normal random variable are given by areas under the curve. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right). .5 .5 x The Standard Normal Distribution The Standard Normal Distribution is a normal distribution with the special properties that is mean is zero and its standard deviation is one. 0 1 Standard Normal Probability Distribution The letter z is used to designate the standard normal random variable. 1 z 0 Cumulative Probability Probability that z ≤ 1 is the area under the curve to the left of 1. P ( z 1) 0 1 z What is P(z ≤ 1)? To find out, use the Cumulative Probabilities Table for the Standard Normal Distribution Z .00 .01 .02 ● ● ● .9 .8159 .8186 .8212 1.0 .8413 .8438 .8461 1.1 .8643 P ( z .8665 1) .8686 1.2 .8849 .8888 ● .8869 Area under the curve •68.25 percent of the total area under the curve is within (±) 1 standard deviation from the mean. •95.45 percent of the area under the curve is within (±) 2 standard deviations of the mean. 68.25% 95.45% 2 1 0 1 z 2 Exercise 1 a) What is P(z ≤2.46)? Answer: b) What is P(z ≥2.46)? a) .9931 b) 1-.9931=.0069 2.46 z Exercise 2 a) What is P(z ≤-1.29)? Answer: b) What is P(z ≥-1.29)? a) 1-.9015=.0985 b) .9015 Red-shaded area is equal to greenshaded area -1.29 Note that: P ( z 1.29) 1 P ( z 1.29) 1.29 z Note that, because of the symmetry, the area to the left of -1.29 is the same as the area to the right of 1.29 Exercise 3 What is P(.00 ≤ z ≤1.00)? P(.00 ≤ z ≤1.00)=.3413 0 1 z P(.00 z 1) P( z 1) P( z 0) .8413 .5000 .3413