Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTIKA CHATPER 4 (Perhitungan Dispersi (Sebaran)) 4-1 Range 4-2 Quartil , Desil, Persentil 4-3 Standar Deviasi 4-4 Variance SULIDAR FITRI, M.Sc March 18,2014 STMIK AMIKOM Yogyakarta UKURAN DISPERSI Suatu metode analisis yang ditujukan untuk mengukur besarnya penyimpangan / penyebaran dari distribusi data yang diperloleh terhadap nilai sentralnya. Istilah Dispersi = Sebaran (Spread/ variation) Jika semua nilai data sama, maka data tersebut sama dengan rata-ratanya dan tidak ada variasi/sebaran data Variasi data ada apabila beberapa nilai data berbeda dari nilai rata-rata. We will discuss the following measures of spread: range, quartiles, variance, and standard deviation RANGE Salah satu cara untuk mengukur nilai sebaran adalah dengan mendapatkan nilai terkecil (minimum) dan terbesar (maximum) dalam dataset. Range = max min Nilai Range sangat dipengaruhi oleh data outlier Quartiles Three numbers which divide the ordered data into four equal sized groups. Q1 has 25% of the data below it. Q2 has 50% of the data below it. Q3 has 75% of the data below it. Chapter 2 (Median) BPS - 5th Ed. 4 Quartiles Uniform Distribution 1st Qtr BPS - 5th Ed. Q1 Chapter 2 2nd Qtr Q2 3rd Qtr Q3 4th Qtr 5 Obtaining the Quartiles Order the data. For Q2, just find the median. For Q1, look at the lower half of the data values, those to the left of the median location; find the median of this lower half. For Q3, look at the upper half of the data values, those to the right of the median location; find the median of this upper half. Chapter 2 BPS - 5th Ed. 6 Cara Menemukan Quartil: How to find the Quartile location? (Median location + 1) / 2 Note: if the median location is a fractional value (as when n is even), the fraction should be dropped before computing the quartile location Example Dataset (N is odd): 1 3 5 5 6 7 8 9 13 Median location: (N + 1) / 2 = (9 + 1) / 2 = 5 1 3 5 5 6 7 8 9 13 Quartile location = (5 + 1) / 2 = 3 Count up 3 from the bottom &3 down from the top for the quartiles 1 Q1 3 5 5 6 7 8 IQR = 8 – 5 = 3 9 13 Q3 Example Dataset (N is even): 1 3 5 5 6 7 Median location: (N + 1) / 2 = (8 + 1) / 2 = 4.5 1 3 5 5 6 7 8 9 Average of 5 & 6 = 5.5 8 9 Quartile location = (4 + 1) / 2 = 2.5 Count up 2.5 from the bottom & 2.5 down from the top for the quartiles 1 Q1 3 5 5 6 7 IQR = 7.5 – 4 = 3.5 8 9 Q3 RUMUS LAIN: Weight Data: Sorted L(M)=(53+1)/2=27 L(Q1)=(26+1)/2=13.5 100 101 106 106 110 110 119 120 120 123 Chapter 2 124 125 127 128 130 130 133 135 139 140 148 150 150 152 155 157 165 165 165 170 170 170 172 175 175 180 180 180 180 185 185 185 186 187 192 194 195 203 210 212 BPS - 5th Ed. 215 220 260 10 Weight Data: Quartiles Q1= 127.5 Q2= 165 (Median) Q3= 185 Chapter 2 BPS - 5th Ed. 11 10 11 12 first quartile 13 Quartiles 14 15 16 median or second quartile 17 third quartile 18 19 20 21 22 23 24 25 26 Weight Data: BPS - 5th Ed. Chapter 2 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 12 Five-Number Summary minimum Q1 M = 100 = 127.5 = 165 Q3 = 185 maximum = 260 Interquartile Range (IQR) = Q3 Q1 = 57.5 IQR gives spread of middle 50% of the data Chapter 2 BPS - 5th Ed. 13 Ex. 4 Given the sorted weights of 30 female athletes, find the three Quartiles: 94 101 105 107 108 109 110 112 113 115 119 123 124 124 124 127 130 130 135 136 136 141 148 149 150 156 160 160 162 163 (1) There are 30 values in the sample: n=30 (2) Q1 is the value at, or right above the value in the position (0.25)30: Q1 :::: 0.25 30 7.5 Since there is no position 7.5, we round 7.5 up to the next whole number 8. Then Q1 is the value in 8th position : Q1 112 (3) Q2 is the value in the middle of the 30 values: Q2 :::: 0.530 15 15 is a whole number, therefore Q2 ( the median value) is half-way between 15th and 16th values of the data set: 124 127 Q2 125 .5 2 (4) Q3 is the value in position: Q3 :::: 0.75 30 22 .5 or the value in the 23rd position: Q3 148 Percentiles Just as there are three quartiles separating data into four parts, there are 99 percentiles denoted P1, P2, . . . P99, which partition the data into 100 groups. If the position of the given percentile is a whole number, the data value that corresponds to this percentile is half-way between the value in this position and the next value. If the position of the given percentile is a decimal number, round it up to the next whole number. The data value that corresponds to this percentile is in that position. Finding the value of a percentile: Find the athletes: 94 101 105 107 108 109 110 112 113 115 119 123 124 124 124 90th 127 130 130 135 136 136 141 148 149 150 156 160 160 162 163 percentile of the given sorted weights of the 30 female Ex. 5 P90 is the value of the set that is the 90th percentile of the set, and is therefore located at or right after position: 0.930 27 Since 27 is a whole number, P90 is the value that is half-way between the 27th and the 28th values of the set: P90 160 160 160 2 160 lb is the 90th percentile of the 30 female athlete weights, meaning 90% of the sampled athletes weigh less than 160 lbs. Finding the percentile of a value: What percentile is 135 in this set of values? 127 130 130 135 136 136 141 148 149 150 156 Ex. 6 135 is value which is higher than 3 values of the sorted set. Total number of values in this data set is 11. The proportion of all values of this set that are lower than 135 is then: 3 100 % 27 % 11 135 lbs is 27th percentile of this data set, meaning that 27% of values in the data set are less than 135 lbs. DESIL Bilangan yang membagi data menjadi 10 bagian yang sama Sehingga dalam 10 data terdapat 9 desil. UKURAN PENYIMPANGAN X 1 0 6 1 d 1–2 0–2 6–2 1-2 -1 -2 +4 -1 DEVIASI RATA-RATA Jika dicari nilai mutlak untuk deviasi rata-rata Deviasi rata-rata data yang dikelompokan DEVIASI STANDAR Adalah standar penyimpangan data dari rata-ratanya 2 2 ( X ) ( X ) 2 2 pulation SD Population = SD = = = N N Standard Deviation in a Sample: s= (x - x) n-1 shortcut formula for the the need to know the mean): s= 2 SD in a Sample (eliminates nx ) - (x) n (n - 1) 2 2 Given the following data on the amount of pocket money, in dollars, of 4 sampled individuals, find the sample SD: x xx (x x)2 1 -4 16 4 -1 1 5 0 0 10 +5 25 mean ẋ = 5 s= Ex. 2 42 = 3.7 4-1 ∑ = 42 (1) Add a deviation column: (2) Add a column for the squares of the deviations: (3) Divide the sum of the squares of the deviations by the number of values decreased by 1, then take the square root Pocket money amounts, in dollars, in the sample are spread $3.7, on average, away from the mean amount of $5. Again: Standard Deviation in a Population: x 2 N Standard Deviation in a Sample: x x 2 s n 1 division by (n-1) makes the Sample SD target the value of the Population SD closer and is necessitated by reasons studied in later statistics classes. Deviasi standar untuk data dikelompokan Some more definitions: Variance is the square of the Standard Deviation : 2 in a sample: var s 2 In a Population: VAR Range is the difference between the maximum value and the minimum value in the set: range max min Definitional Formulas Population Variance 2 Sample Variance s2 (X X) 2 s = n 1 (X ) 2 = N 2 2 (X - ) = deviation (X - X) = deviation (X - ) 2 = squared-deviation (X - X) 2 = squared-deviation (X - ) 2= sum of sq. deviations = SS (X - X) 2 = sum of sq. deviations=SS N = population size n-1 = ( sample size – 1) = “degrees of freedom” (X ) N 2 = mean sq. deviation = variance = 2 (X X) 2 = mean sq. deviation= n 1 variance = s2 Any Queries ?