Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
(3.1, 3.2, 3.4) Measures of Location Measures of Location: Mean: average value Sample mean: denoted by x xi n x Population mean: denoted i N Median: value in the middle when the data are arranged in ascending order (smallest to largest). With an odd number of observations, the median is the middle value, however with an even number of observations, having no middle value, the average of the two middle values is the median. Practice: Find the median for the set of numbers below 5, 6, 8, 10, 13, 15, 15 Median = _____ Now find the median for this set of numbers 1, 3, 4, 5, 7, 8, 9, 12 Median = _______ Mode: the value that occurs with greatest frequency (may be more than one mode in a set of data). Percentile: the pth percentile is the value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent of the observations are greater than or equal to this value. For example: If you scored in the 90th percentile on the verbal part of your SAT’s, this would mean that you scored above 90% of all verbal scores taken for the SAT’s at that time. 1. 2. 3. 4. To calculate the pth percentile: arrange the data in ascending order compute an index i=(p/100)*n if i is not an integer, round up, the next integer greater than I denotes the position of the pth percentile if i is an integer the pth percentile is the average of the values in positions i and i+1. Practice Finding Percentile: 48 49 51 52 54 57 57 58 59 63 64 65 66 66 67 70 70 71 72 73 74 75 75 76 76 77 78 79 82 83 84 84 85 87 87 88 89 91 91 92 93 94 95 97 98 99 99 100 101 106 There are 50 elements in this data set. The median is: To find the 85th percentile: Quartiles: used to divide the data into 4 parts. Q1: first quartile, _____ percentile Q2: second quartile, _____ percentile, median Q3: third quartile, _____ percentile 5 number summary: 1. Smallest value 2. first quartile (Q1) 3. Median (Q2) 4. third quartile (Q3) 5. Largest value So the 5 number summary for this data set is: Measures of Variability Range: largest value – smallest value In this example: Inter-quartile range (IQR): Q3 – Q1, the range of the middle 50% of the data. IQR: Outliers (equation 1.5*IQR): unusually extreme values in the data, may be unusually large or small. Box Plots: a graphical summary of data that is based on a five-number summary. 40 50 60 70 80 90 100 110 IQ scores example 91, 101, 106, 107, 110, 112, 114, 115, 132, 147 5 Number Summary: IQR: Outliers: 90 100 110 120 130 140 150 Shape of Distribution Skewness Binomial Distribution Binomial Distribution 0.25 0.2 P(X=x) P(X=x) Binomial Distribution 0.35 0.15 0.1 0.05 0 0.35 0.3 0.3 0.25 0.25 P(X=x) 0.3 0.2 0.15 0.1 0.1 0.05 0.05 0 0 1 2 3 4 5 6 7 8 9 10 Number of Successes in 10 trials Symmetric the mean and the median are the same 0.2 0.15 0 0 1 2 3 4 5 6 7 8 9 10 Number of Successes in 10 trials Skewed to the right because data is PULLED farther to the right 0 1 2 3 4 5 6 7 8 9 10 Number of Successes in 10 trials Skewed to the left because data is PULLED farther to the left Which is greater, mean or the median in our IQ example? Median is not affected by skewness or outliers whereas the mean IS. The mean is pulled in the direction of outlier or “tail” (skew). Binomial Distribution Binomial Distribution 0.35 0.3 0.3 0.25 0.25 0.2 P(X=x) P(X=x) 0.35 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 0 1 2 3 4 5 6 7 8 Number of Successes in 10 trials Median Mean 9 10 0 1 2 3 4 5 6 7 8 9 10 Number of Successes in 10 trials Mean Median