Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 Random Sampling and Data Description Learning Objectives • Compute and interpret the sample mean, sample variance, sample standard deviation, sample median, and sample range • Explain the concepts of sample mean, sample variance, population mean, and population variance • Construct and interpret visual data displays • Explain the concept of random sampling • Construct and interpret normal probability plots Data Summary and Display • Essential to good statistical thinking • Focus on important features of the data • Provide insight about the type of model that should be used • Computer has become an important tool in the presentation and analysis of data • User enters the data and then selects the types of analysis • Packages are available for both mainframe computers as well as personal computers Sample Mean • Useful to describe data features numerically • Can characterize the location or central tendency • Refer to this arithmetic mean as the sample mean n • x1 x2 ..... xn x n x i 1 n n • Where the n observations in the sample are denoted by x1, x2,…, xn • Sample mean as a reasonable estimate of the population mean, Sample Standard Deviation • Sample mean does not provide all of the information • Variability in the data may be described by the sample variance or the sample standard deviation n s2 2 ( x x ) i i 1 n 1 • Sample standard deviation, s, is the positive square root of the sample variance Sample Range • Difference between the largest and smallest observations, or the sample range, is a useful measure of variability • Sample range r=max(xi)-min (xi) • As the variability in sample data increases, the sample range increases Sample Median and Sample Mode • Two more measure of central tendency • Median divides the data into two equal parts, half below the median and half above – If the number of data points is even, the median is halfway between the two central values – If the number data points is odd, the median is the central value • Mode is the most frequently occurring data point (s) Example • • The data below are the joint temperatures of the O-rings (degrees F) for each test firing or actual lunch of the space shuttle rocket motor (from Presidential Commission on the Space Shuttle Challenger Accident): 84 49 61 40 83 67 45 66 70 69 80 58 68 60 67 72 73 70 57 63 70 78 52 67 53 67 75 61 70 81 76 79 75 76 58 31 Compute the sample mean, sample median, sample range, and sample standard deviation Solution • Sample mean is 65.85,or X 84 49 .... 31 65.85 35 • Sample median is 67 31 40 45 49……[67.5]…… • Sample range is 53, or r = 84-31 • Sample standard deviation is 12.16 • 2 (84 65.85)2 (49 65.85)2 ... S 35 1 147.86 Random Sampling • Interested to work with a sample of observations selected from a population • Relationship between the population and the sample • Impossible or impractical to observe the entire population • Use a probability distribution as a model for a population • Sample from the population to make decisions about the population Understand Random Sampling • Wish to reach a conclusion about the proportion of people who earn at least $35,000 in a specific year • Let p represent the unknown value of this proportion • Impractical to question every individual • Make inference regarding the true proportion p • Select a random sample • Use the observed proportion p̂ of people • p̂ is computed by dividing the number of individuals in the sample by the total n • Many random samples are possible • Value of p̂ will vary. That is, p̂is a random variable Statistic • Random sample is called a statistic • Statistic is any function of the observations in a random sample • Sample mean X , the sample variance S2, and the sample standard deviation s are statistics Data Display • Graphical displays of sample data are very powerful • Many techniques Stem-and-Leaf Diagrams • Stem-and-leaf diagram is a good way to represent the data • Steps 1. Divide each data point into two parts: a stem and a leaf 2. List the stem values in a vertical column 3. Record the leaf for each observation beside its stem 4. Write the units for stems and leaves on the display Example • • • • Consider 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Select as stem values the numbers 2,3, and 4 Record the leaf for each observation beside its stem Last column in the diagram is a frequency count of the no. of leaves associated with each stem Frequency 6 2 144677 3 3 028 1 4 1 24 Frequency Distributions • More compact summary of data than a stem-andleaf • Must divide the range of the data into intervals • Called class intervals, cells, or bins • Number of bins depends on the number of observations • Equal to the square root of the number of observations Histograms • Visual display of the frequency distribution • Gives insight about possible choices of probability distribution • Stages for constructing – 1) Label the bin boundaries on a horizontal scale – 2) Mark and label the vertical scale with the frequencies – 3) Draw a rectangle where height is equal to the frequency corresponding to that class Cumulative Frequency Plot • Variation of the histogram • Useful in data interpretation • Height of each bar is the total number of observations that are less or equal to the upper limit of the class • Illustrated in the right graph Example • Consider the following data on the motor fuel octane ratings of several blends of gasoline. • Construct a frequency distribution and histogram – Use 8 classes Solution • Illustrated in the right Frequency Histogram 0.4 0.3 frequency 0.2 0.1 0 82 86 90 94 Octane Data 98 102 Probability Plots • Graphical method for determining whether sample data points conform to a hypothesized distribution • Very simple and can be constructed quickly • Uses special graph paper, known as probability paper • Focus primarily on normal probability plot Constructing a Probability Plot • Sample data points are first ranked from smallest to largest • x1, x2,..., xn is arranged x(1),x(2),…, x(n) • Plotted against their observed cumulative frequency [(j -0.5)/n] on the probability paper • Plotted points fall approximately along a line • Constructed on ordinary graph paper by plotting the standardized normal scores zj against x(j) • Standardized normal scores satisfy [(j-0.5)/n]= P(Zzj)=(zj) Example • • • A soft-drink bottler is studying the internal pressure of 1-liter glass bottles. A random sample of 16 bottles is tested, and the pressure strength (psi) are obtained. The data are shown below. 226.16 202.20 219.54 219.54 193.73 208.15 195.45 193.71 200.81 211.14 203.62 188.12 224.39 221.31 204.55 202.21 201.63 Does it seem reasonable to conclude that pressure strength is normally distributed? Solution • Use the steps to construct a probability plot • Assumption of normality appears reasonable • Data falls along a straight line Normal Probability Plot 99.9 99 95 80 cumul. percent 50 20 5 1 0.1 180 190 200 210 Pressure Strength 220 230 Next Agenda • Discusses point estimation of parameters • Introduces some of the important properties of estimators, the method of maximum likelihood, sampling distributions, and the central limit theorem