Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECON 4630 ECON 5630 TOPIC #2: DESCRIPTIVE STATISTICS I. Graphical Methods A. Definitions 1. Frequency distribution 2. Relative frequency distribution 3. Cumulative relative frequency 4. Example: Family Size in Uganda # of children frequency (f) 0 1 1 3 2 5 3 1 4 3 5 4 6 6 7 10 8 7 9 3 10 3 11 2 12 1 13 1 relative frequency (f/n) n = 50 2 B. 188.9 176.3 175.9 175 185.9 187.3 172.1 190.1 189.4 164.1 187.6 181.3 187.4 188.8 181.2 182.4 193.1 164.9 172.6 184.8 Grouping data into classes 1. General 2. Example: Men’s Height (sample of 200) 180.9 181.1 173.6 185.9 190.1 180.0 181.4 177.4 179.3 196.4 174.3 178.6 174.5 186.7 180.7 177.1 179.9 166.1 184.1 186.9 179.6 183.0 187.1 185.3 170.6 180.5 183.1 185.8 182.1 186.1 181.5 182.5 183.3 178.9 175.2 175.4 182.9 186.4 190.3 157.2 164 188.9 203.1 173.1 189.9 184.3 158.6 197.4 170.2 178.1 173.3 195.2 176.2 188.2 187.3 190.4 174.0 148.5 194.8 174.1 182.5 182.1 175.1 189.5 168.6 185.1 197.2 179.7 174.0 171.8 178.6 151.3 190.2 190.3 172.9 166.1 184.0 185.1 180.0 179.1 177.5 191.1 183.2 172.4 175.1 193.3 159.0 195.4 160.1 191.0 184.1 181.1 181.5 188.0 189.1 172.8 177.2 183.1 186.2 165.9 170.6 170.1 177.7 181.0 174.8 185.3 193.5 187.3 173.5 167.3 190.4 175.1 177.8 200.0 172.1 186.5 171.2 182.4 197.2 180.2 180.1 177.1 178.3 189.3 173.3 179.2 163.6 183.4 171.0 185.4 171.1 196.1 189.3 175.6 176.1 178.1 184.3 186.8 178.4 192.3 183.1 170.3 180.8 190.0 182.7 173.5 182.1 184.2 189.2 183.3 173.7 171.9 183.0 191.9 179.9 171.0 194.0 182.0 180.1 187.9 177.5 171.8 180.5 163.7 177.7 177.0 186.3 168.9 188.1 176.3 188.2 167.5 187.0 189.1 178.8 179.4 184.5 188.0 179.0 170.1 3 3. Grouping Data Into Classes a) Select the Number of Classes, k k 2k 1 2 3 4 5 6 b) Select the Class Interval, i 4 Class Boundaries LimitL - LimitU c) Set the Class Limits d) Tally the Data Into Classes Class Midpoint f f/n Cumulative frequency Cumulative relative frequency 148.5 155.5 162.5 169.5 176.5 183.5 190.5 197.5 - 5 C. Histograms Definition: A histogram describes a frequency distribution using a series of adjacent rectangles, where the height of each rectangle is 10 frequency 8 6 4 2 152 159 166 173 180 187 194 201 159 166 173 180 187 194 201 Relative frequency 0.25 0.20 0.15 0.10 0.05 152 6 D. Frequency Polygons Definition: A frequency polygon describes a frequency distribution by using a series of line segments connecting 10 frequency 8 6 4 2 152 159 166 173 180 187 194 201 159 166 173 180 187 194 201 Relative frequency 0.25 0.20 0.15 0.10 0.05 152 7 Cumulative frequency and relative frequency polygons 200 1.00 160 0.80 120 0.60 80 0.40 40 0.20 152 159 166 173 180 187 194 201 8 Cumulative relative frequency Cumulative frequency E. F. Excel Commands (1) (2) (3) Place a table such as that on page 23 into Excel. To get frequency, use the COUNTIF command minus the previous value (see below). To get frequency, use the COUNTIF command (see below). A B C 1 188.9 2 176.3 range 3 175.9 148.5 155.4999 4 175 155.5 162.4999 5 185.9 162.5 169.4999 6 187.3 169.5 176.4999 7 172.1 176.5 183.4999 8 190.1 183.5 190.4999 9 189.4 190.5 197.4999 10 164.1 197.5 204.4999 11 187.6 12 etc. (4) II. D E frequency =(COUNTIF($A$1:$A$200,"<"&C3)) =(COUNTIF($A$1:$A$200,"<"&C4))-D3 =(COUNTIF($A$1:$A$200,"<"&C5))-D4 etc. relative frequency =(COUNTIF($A$1:$A$200,"<"&C3)) =(COUNTIF($A$1:$A$200,"<"&C4)) =(COUNTIF($A$1:$A$200,"<"&C5)) etc. To create a histogram, click on the “Chart” icon on the Insert tab. Alternatively, go to Data Analysis (under the Data tab), and select Histogram. NOTE: if you don’t see Data Analysis, you will need to go to Add-ins. Location: Numerical Methods A. Mode: the most frequently-occurring value 1. Advantages 2. Disadvantages 9 B. Percentiles 1. Definition: the Pth percentile is a point in the data below which 2. Calculating Percentiles Using “Raw” Data a) 148.5 151.3 157.2 158.6 159.0 160.1 163.6 163.7 164.0 164.1 164.9 165.9 166.1 166.1 167.3 167.5 168.6 168.9 170.1 170.1 170.2 170.3 170.6 170.6 171.0 171.0 171.1 171.2 171.8 171.8 171.9 172.1 172.1 172.4 172.6 172.8 172.9 173.1 173.3 173.3 173.5 173.5 173.6 173.7 174.0 174.0 174.1 174.3 174.5 174.8 175.0 175.1 175.1 175.1 175.2 175.4 175.6 175.9 176.1 176.2 b) Rank data in ascending order 176.3 176.3 177.0 177.1 177.1 177.2 177.4 177.5 177.5 177.7 177.7 177.8 178.1 178.1 178.3 178.4 178.6 178.6 178.8 178.9 179.0 179.1 179.2 179.3 179.4 179.6 179.7 179.9 179.9 180.0 180.0 180.1 180.1 180.2 180.5 180.5 180.7 180.8 180.9 181.0 181.1 181.1 181.2 181.3 181.4 181.5 181.5 182.0 182.1 182.1 182.1 182.4 182.4 182.5 182.5 182.7 182.9 183.0 183.0 183.1 183.1 183.1 183.2 183.3 183.3 183.4 184.0 184.1 184.1 184.2 184.3 184.3 184.5 184.8 185.1 185.1 185.3 185.3 185.4 185.8 185.9 185.9 186.1 186.2 186.3 186.4 186.5 186.7 186.8 186.9 187.0 187.1 187.3 187.3 187.3 187.4 187.6 187.9 188.0 188.0 188.1 188.2 188.2 188.8 188.9 188.9 189.1 189.1 189.2 189.3 189.3 189.4 189.5 189.9 190.0 190.1 190.1 190.2 190.3 190.3 190.4 190.4 191.0 191.1 191.9 192.3 193.1 193.3 193.5 194.0 194.8 195.2 195.4 196.1 196.4 197.2 197.2 197.4 200.0 203.1 Calculate position of the Pth percentile LP = where LP is the position of the Pth percentile 10 3. c) To find the Pth percentile, using the data arranged in ascending order select the LPth observation from the top. If LP is not an integer, find a weighted average of the two adjacent observations. d) Examples: Calculating the qth percentile using grouped data q th percentile L where: L = q n CF (i) f the lower limit of the class containing the qth percentile n= total number of frequencies f= frequency in the class containing the qth percentile CF = cumulative number of frequencies in the classes preceding the class containing the qth percentile i= class interval 11 Class Boundaries f f/n LimitL - LimitU Class Midpoint Cumulative frequency Cumulative relative frequency 148.5 -155.4999 152 2 0.01 2 0.01 155.5 -162.4999 159 4 0.02 6 0.03 162.5 -169.4999 166 12 0.06 18 0.09 169.5 -176.4999 173 44 0.22 62 0.31 176.5 -183.4999 180 64 0.32 126 0.63 183.5 -190.4999 187 56 0.28 182 0.91 190.5 -196.4999 194 16 0.08 198 0.99 197.5 -204.4999 201 2 0.01 200 1.00 200 1.00 a) Median b) Lower quartile c) Upper quartile d) 33rd percentile 12 C. The mean 1. Digression: summation notation n x x i 1 i 1 x 2 x3 ... x n Factoids: n (a) k kn i 1 (b) (c) n n i 1 i 1 kxi k xi n n n i 1 i 1 i 1 xi y i xi y i Examples: 2. The population mean 3. The sample mean in general 13 4. Calculating the sample mean for “raw” data 5. Calculating the sample mean for grouped data 14 6. Geometric mean: useful for finding the average of percentages, ratios, indexes, or growth rates. Let xi be the percentage change in period i: GM n 1 x1 1 x2 1 x3 1 xn 1 Example: suppose Jack got a 3% raise in year 1, a 4% raise in year 2, and a 10% raise in year 3. What’s his average raise? Similarly, to find the average percent increase over time, use GM n X end 1 where Xend is the value at the end of the period. X start 15 D. Relative Positions of the Mean, Mode, and Median 1. Symmetric Distributions 2. Positive-skewed Distributions 3. Negative-skewed Distributions 16 E. Excel Commands If your data are in cells A2 to A51, 1. =MODE(A2:A51) 2. =MEDIAN(A2:A51) 3. =AVERAGE(A2:A51) 4. =PERCENTILE(A2:A51,.25) for the 25th percentile (NOTE: Excel uses a peculiar interpolation method to calculate percentiles, so your calculations may differ from those calculated by Excel, especially for smaller data sets. 17 III. Measures of Spread A. Why Is Spread Important? B. Ways to Measure Spread Example: UNT Women’s Basketball – points scored, last 12 games 2007-08 Game Points Scored 1 87 2 70 3 67 4 70 5 55 6 75 7 73 8 70 9 63 10 52 11 75 12 53 XX X X 2 Σ =810 18 1. The range 2. The inter-quartile range (IQR) 3. Mean (absolute) deviation 19 4. Variance and standard deviation a) Definitions: Variance: the arithmetic mean of the squared deviations from the mean. Standard deviation: the square root of the variance; will be in the same units as the data. b) Population Formulae c) Sample Formulae i. Formulae for “Raw” Data 20 ii. d) C. Formulae for Grouped Data The “Empirical Rule” Excel Commands 1. Many descriptive statistics are possible using Tools Data Analysis Descriptive Statistics 2. Alternatively, use Excel functions (assuming data are in cells from A2 to A51: a) Minimum: =MIN(A2:A51) b) Maximum: =MAX(A2:A51) c) Sample standard deviation: =STDEV(A2:A51) d) Population standard deviation : =STDEVP(A2:A51) 21 NOTES 22