Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Statistics 01 Describing Data JDS Special Program: Pre-training 1 Describing Data 1. Numerical Measures Measures of Location Measures of Dispersion Correlation Analysis 2. Frequency Distributions (Relative) Frequency Distribution Histogram JDS Special Program: Pre-training 2 Measures of Location Arithmetic mean Weighted mean Geometric mean Median Xw = ( w1 X 1 + w2 X 2 + ... + wn X n ) ( w1 + w2 + ...wn ) GM = n ( X 1)( X 2)( X 3)...( Xn) The midpoint of the values after they have been ordered from the smallest to the largest Mode The value of the observation that appears most frequently JDS Special Program: Pre-training 3 Population Mean The population mean is the sum of all the population values divided by the total number of values: X ∑ µ= i N where µ is the population mean. N is the total number of observations. Xi is a particular value. Σ indicates the operation of adding. JDS Special Program: Pre-training 4 Sample Mean The sample mean is the sum of all the sample values divided by the number of sample values: ΣX i X= n Where n is the total number of values in the sample. JDS Special Program: Pre-training 5 Properties of the Arithmetic Mean Every data set has a mean. All the values are included in computing the mean. A set of data has a unique mean. The mean is affected by unusually large or small data values. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero. JDS Special Program: Pre-training 6 Measures of Dispersion Range The difference between the largest and the smallest values Mean deviation The arithmetic mean of the absolute values of the deviations from the arithmetic mean. The formula is: MD = Σ Xi − X n Variance Standard deviation (s.d.) JDS Special Program: Pre-training 7 Variance The population variance is the arithmetic mean of the squared deviations from the population mean. The formula is: 2 Σ − ( µ ) X i σ2 = N The formula for the sample variance is: 2 Σ ( X − X ) i s2 = n −1 JDS Special Program: Pre-training 8 Properties of Variance & S.D. The major characteristics of variance are: All values are used in the calculation. Not influenced by extreme values. The units are the square of the original units. The population (sample) standard deviation σ (s) is the square root of the population (sample) variance. The units are the same as the original ones. JDS Special Program: Pre-training 9 Interpretation and Uses of S.D. Empirical Rule: For any symmetrical, bellshaped distribution: About 68% of the observations will lie within 1σ the mean. About 95% of the observations will lie within 2σ of the mean. Virtually all (99.73%) the observations will be within 3σ of the mean. JDS Special Program: Pre-training 10 Bell - Shaped Curve showing the relationship between σ and µ. µ−3σ µ−2σ µ−1σ µ µ+1σ JDS Special Program: Pre-training µ+2σ µ+ 3σ 11 Relative Dispersion The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: s CV = (100%) X JDS Special Program: Pre-training 12 The Coefficient of Correlation The (sample) Coefficient of Correlation (r) is a measure of the strength of the linear relationship between two variables. r= Σ( X − X )(Y − Y ) Σ( X − X ) 2 Σ(Y − Y ) 2 It can range from -1 to 1. Values of -1 or 1 indicate perfect & strong correlation. Values close to 0 indicate weak correlation. Negative values indicate an inverse relationship and positive values indicate a direct relationship. JDS Special Program: Pre-training 13 Frequency Distribution A frequency distribution is a grouping of data into mutually exclusive categories showing the number of observations in each class. A relative frequency distribution shows the percent of observations in each class. JDS Special Program: Pre-training 14 EXAMPLE 1 Dr. K is a professor of ABC University. He wishes to prepare a report showing the number of hours per week students spend studying. About his 30 students, he determines the number of hours each student studied last week. 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6. JDS Special Program: Pre-training 15 EXAMPLE 1 continued Hours Frequency 7.5 up to 12.5 f 1 1/30=.0333 12.5 up to 17.5 12 12/30=.400 17.5 up to 22.5 22.5 up to 27.5 10 5 10/30=.333 5/30=.1667 27.5 up to 32.5 32.5 up to 37.5 1 1 1/30=.0333 1/30=.0333 TOTAL 30 30/30=1 Relative Frequency JDS Special Program: Pre-training 16 Frequency Histogram for Studying-Hours 14 12 10 8 6 4 2 0 10 15 20 25 30 35 Hours spent studying JDS Special Program: Pre-training 17