* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Numerical Descriptions of Data
Survey
Document related concepts
Transcript
Numerical Descriptions of Data Central Tendency Variation Shape Empirical Rule Relationship Excel 1. Central Tendency – Center of data Purpose: (a) To give symbols for the population and sample mean found in the second Building Block of the course (sample mean – population mean) ≠ 0) Median Mode Mean or Average – 1.1 Quantitative Data 1.1.1 Sample Mean – Average of sample Symbol X 1.1.2 Population Mean – Average of population Symbol, 1.2 Qualitative Data Consider a case where you have three successes and two failures. The proportion of successes is then 3 out of 5 or 0.60. Let successes be represented by the value 1 and failures by the value 0. The data then consists of the five numbers: 1,1,1,0,0. When you average those five numbers you also get 3/5 or .6. The proportion is a special case of the average, when you average 0’s and 1’s. 1.1.1 Sample proportion – Proportion of successes in the sample Symbol p̂ 1.1.2 Population proportion – Proportion of successes in the f population Symbol, P 2. Variation – Spread of data Range Variance 2.1 Quantitative Data 2.1.1 Population Variance Average squared distance values are from the center of the data Symbol, 2.1.2 Sample Variance Estimate of population variance Symbol S2 Standard Deviation – Purpose: this is the measure of variation we will use in the third Building Block (The standard error depends on two values: a measure of variation and a measure of knowledge) 2.1.3 Population standard deviation Square root of population variance Symbol, 2.1.4 Sample standard deviation Square root of sample variance Symbol, S 2.1.5 Example of use: http://www.businessofbaseball.com/yankeespayroll.htm 2.1.6 Calculation: a. Calculate the average of the values. b. Subtract the average from each value to see how far each value is from the average. c. Squaring each difference. d. Sum all the squared values e. i. For the population divide the sum by the number of values. ii. For the sample, divide by the number of values minus one. f. To find the standard deviation take the square root of the average in e. Both population and sample uses steps a-c and e. The difference between them occurs at step d 2.1.7 Example: Calculate the population and sample standard deviations for a set of five numbers. Double Click on the Embedded Excel file below. Click the F9 function key to get new examples: (When finished click anywhere outside the worksheet) Values 2 5 2 5 3 Step a: mean =3.4 Step b. Step c. Distance to Average Square the Distances (2-3.4)=-1.4 (5-3.4)=1.6 (2-3.4)=-1.4 (5-3.4)=1.6 (3-3.4)=-0.4 1.96 2.56 1.96 2.56 0.16 Step d. Step e. Step f. 9.2 Sum = = s2= 2 9.2/5 =1.84 9.2/4 =2.3 = 1.356465997 s= 1.516575089 If the above embedded Excel file does not work, then go to this link: Variance and Standard Deviation Calculations Examples 2.2 Qualitative Data: The symbols for standard deviation and variance for the sample and population are the same as in qualitative data. 2.2.1 Population Variance: You may use the same rule as for quantitative data or you can use a shortcut formula for the population variance. The variance in a population of 0’s and 1’s can be shown to be p(1-p) Sample Variance: Unlike the sample variance for quantitative data where you change the divisor from n to n-1, traditionally the approach in proportion is to multiply the sample proportion of successes times the sample proportion of failures S2 = p̂ ( 1-p̂ ) 3. Shape – Distribution of values Right skewed Left skewed Symmetric Relationship to mean and median 4. Empirical rule – particular distribution Purpose: We introduce how to determine probabilities by rule instead of counting. Probability is used in second note of the third Building Block: (To evaluate an error we will use probability.) The empirical rule is an approximation to the bell-shaped curve: It is graphed as a histogram with ranges based on the mean and standard deviation. If the distribution of the data is a bell curve then the following (approximate) percentages will result; actual % differ. Range Approximate - 3* up to - 2* 2.50% - 2* up to - 13.50% - up to 34.00% up to + 34.00% + up to + 2* 13.50% + 2* up to + 3* 2.50% Example : Suppose the mpg of cars were collected and the data follows the bell curve (a normal distribution). The cars had an average mpg of 29 miles and a standard deviation of 5. The ranges and percentages become Range 14 to 19 19 to 24 24 to 29 29 to 34 34 to 39 39 to 44 Percent Cumulative Pct 2.5 2.5 13.5 16 34 50 34 84 13.5 97.5 2.5 100 Probability of Being in the Range Empirical Rule Example 40 30 20 Percent 10 0 14 to 19 19 to 24 24 to 29 29 to 34 34 to 39 39 to 44 Intervals based on the mean and standard deviation Questions: 1. What is the probability of finding cars with a value of mpg greater than 29 miles? 2. What is the probability of finding cars with a value of mpg less than 19 miles? 3. What is the probability of finding cars with a value of mpg between 29 and 39 miles? Answers 1. 0.5 2. 0.025 3. 0.475 After double clicking this once, press the F9 key for more examples If the embedded Excel file does not work, then go to the following link https://wweb.uta.edu/faculty/eakin/busa3321/empiricalrule_example.xls Exercise: Using Internet Explorer answer the questions on the following web page. If you use Internet Explorer you may attempt this more than once by Backing up. (The page keeps track of the number of attempts.) Print the page when successful and bring it to class next time. http://wweb.uta.edu/faculty/eakin/asps/examples/EmpiricalRuleques.asp 5. Relationship. Purpose: To introduce applications of means and variances when relating two variables. 5.1 Definitions Straight line – mathematical versus tendency Slope – the average amount of change in Y with a one unit increase in X Intercept – the average value of Y when X is zero. Correlation – the strength of the linear association between Y and X. Coefficient of Determination – the percent of sample variation in Y associated with variation in X Use of the standard deviation, the correlation, the empirical rule and the slope (called beta) with stock price risk: http://www.investorguide.com/igu-article-823-stock-basics-measuring-a-stocks-risk.html Definition of coefficient of determination in finance terms: http://www.finance-lib.com/financial-term-efficient-diversification.html 5.2 Symbols: Term Intercept Slope Correlation Coefficient of Determination Population 0 1 Sample b0 b1 r R2 5.3 Examples For interpreting sample slope and intercept double click the embedded Excel file below To create new examples click on any green shaded cell and change names and values. Y = Sales X= Size X unit = thousand square foot b0 = b1 = 1,670,000 964000 increases by Intercept: The estimated average value of Sales is 1670000 when Size is zero. (Many times the intercept does not have a realistic meaning.) Slope: For an increase of one thousand square foot in Size, the estimated average Sales increases by 964000 or go to https://wweb.uta.edu/faculty/eakin/busa3321/EstimateInterpretation.xls For interpreting R2 Double click below INTERPRETATION OF R2: To create new examples click on any green shaded cell and change names and values. Y = price X= size X unit = thousand square feet 2 R = 0.904 #NAME? Coefficient of Determination, R2: 90.4% of the sample variation of price is associated with variation insize Or go to http://wweb.uta.edu/eakin/busa3321/VariationExplanation.xls 6. Excel 6.1 Center and Variation Cell Formulas. In the examples below you have data in cells A1 to A5 Statistic Mean Median Variance Standard Deviation Population =Average(a1:a5) =MEDIAN(A1:A5) =VARP(A1:A5) =STDEVP(A1:A5) Sample =Average(a1:a5) =MEDIAN(A1:A5) =VAR(A1:A5) =STDEV(A1:A5) 6.2 Pivot tables can also be used for quantitative variables to calculate means, variances and standard deviations (but not mode or median): Click on the quantitative variable Insert a Pivot table Click on and drag the quantitative variables down to the “ Values” section, Right click over the “Count of” section and select Value Field Settings Choose the description you want (average, variance, or standard deviation) Note (1) if you wish more descriptions then repeatedly drag the quantitative variable to “ Values” but choose a different description each time. (2) if you wish to find the descriptions within the levels of a qualitative variable, then after dragging the quantitative variable to the “ Values” section, drag the qualitative variable to the row (or column) section. 6.2 Relationship Click the Insert tab Click the XY Scatter button On the Series tab choose X Range and Y Range Right click any point Click Add Trendline At the bottom of resulting window – check Display Equation and Display Rsquared