Download Numerical Descriptions of Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Numerical Descriptions of Data






Central Tendency
Variation
Shape
Empirical Rule
Relationship
Excel
1. Central Tendency – Center of data
Purpose: (a) To give symbols for the population and sample mean found in the second Building
Block of the course (sample mean – population mean) ≠ 0)


Median
Mode


Mean or Average –
1.1 Quantitative Data
1.1.1 Sample Mean –
 Average of sample
 Symbol X
1.1.2 Population Mean –
 Average of population
 Symbol, 
1.2 Qualitative Data
Consider a case where you have three successes and two failures. The proportion of
successes is then 3 out of 5 or 0.60. Let successes be represented by the value 1 and failures
by the value 0. The data then consists of the five numbers: 1,1,1,0,0. When you average those
five numbers you also get 3/5 or .6. The proportion is a special case of the average, when you
average 0’s and 1’s.
1.1.1 Sample proportion –
 Proportion of successes in the sample
 Symbol p̂
1.1.2 Population proportion –
 Proportion of successes in the f population
 Symbol, P
2. Variation – Spread of data



Range
Variance
2.1 Quantitative Data
2.1.1 Population Variance
 Average squared distance values are from the center of the data
 Symbol, 
2.1.2 Sample Variance
 Estimate of population variance
 Symbol S2
Standard Deviation – Purpose: this is the measure of variation we will use in the third
Building Block (The standard error depends on two values: a measure of variation and a
measure of knowledge)

2.1.3 Population standard deviation
 Square root of population variance
 Symbol, 
2.1.4 Sample standard deviation
 Square root of sample variance
 Symbol, S
2.1.5 Example of use: http://www.businessofbaseball.com/yankeespayroll.htm
2.1.6 Calculation:
a. Calculate the average of the values.
b. Subtract the average from each value to see how far each value is from the average.
c. Squaring each difference.
d. Sum all the squared values
e. i. For the population divide the sum by the number of values.
ii. For the sample, divide by the number of values minus one.
f. To find the standard deviation take the square root of the average in e.
Both population and sample uses steps a-c and e. The difference between them occurs at step d
2.1.7 Example: Calculate the population and sample standard deviations for a set of five
numbers.
Double Click on the Embedded Excel file below. Click the F9 function key to get new
examples: (When finished click anywhere outside the worksheet)
Values
2
5
2
5
3
Step a: mean =3.4
Step b.
Step c.
Distance to
Average
Square the
Distances
(2-3.4)=-1.4
(5-3.4)=1.6
(2-3.4)=-1.4
(5-3.4)=1.6
(3-3.4)=-0.4
1.96
2.56
1.96
2.56
0.16
Step d.
Step e.
Step f.
9.2
Sum =
=
s2=
2
9.2/5 =1.84
9.2/4 =2.3
=
1.356465997
s=
1.516575089
If the above embedded Excel file does not work, then go to this link: Variance and Standard
Deviation Calculations Examples
2.2 Qualitative Data: The symbols for standard deviation and variance for the sample and population
are the same as in qualitative data.
2.2.1 Population Variance: You may use the same rule as for quantitative data or you can use a
shortcut formula for the population variance. The variance in a population of 0’s and 1’s can be
shown to be
  p(1-p)
 Sample Variance: Unlike the sample variance for quantitative data where you change the
divisor from n to n-1, traditionally the approach in proportion is to multiply the sample
proportion of successes times the sample proportion of failures
S2 = p̂ ( 1-p̂ )
3. Shape – Distribution of values
 Right skewed
 Left skewed
 Symmetric
 Relationship to mean and median
4. Empirical rule – particular distribution
Purpose: We introduce how to determine probabilities by rule instead of counting.
Probability is used in second note of the third Building Block: (To evaluate an error we will
use probability.)
The empirical rule is an approximation to the bell-shaped curve: It is graphed as a histogram
with ranges based on the mean and standard deviation. If the distribution of the data is a
bell curve then the following (approximate) percentages will result; actual % differ.
Range
Approximate
 - 3* up to  - 2*
2.50%
 - 2* up to  - 
13.50%
 -  up to 
34.00%
 up to  + 
34.00%
 +  up to  + 2* 
13.50%
 + 2* up to  + 3*
2.50%
Example : Suppose the mpg of cars were collected and the data follows
the bell curve (a normal distribution). The cars had an average mpg of 29 miles
and a standard deviation of 5. The ranges and percentages become
Range
14 to 19
19 to 24
24 to 29
29 to 34
34 to 39
39 to 44
Percent Cumulative Pct
2.5
2.5
13.5
16
34
50
34
84
13.5
97.5
2.5
100
Probability of Being in the Range
Empirical Rule Example
40
30
20
Percent
10
0
14 to 19
19 to 24
24 to 29
29 to 34
34 to 39
39 to 44
Intervals based on the mean and standard deviation
Questions:
1. What is the probability of finding cars with a value of mpg greater than 29 miles?
2. What is the probability of finding cars with a value of mpg less than 19 miles?
3. What is the probability of finding cars with a value of mpg between 29 and 39 miles?
Answers
1. 0.5
2.
0.025
3.
0.475
After double clicking this once, press the F9 key for more examples
If the embedded Excel file does not work, then go to the following link
https://wweb.uta.edu/faculty/eakin/busa3321/empiricalrule_example.xls
Exercise: Using Internet Explorer answer the questions on the following web page. If you use
Internet Explorer you may attempt this more than once by Backing up. (The page keeps track of the
number of attempts.) Print the page when successful and bring it to class next time.
http://wweb.uta.edu/faculty/eakin/asps/examples/EmpiricalRuleques.asp
5. Relationship. Purpose: To introduce applications of means and variances when relating two
variables.
5.1 Definitions
 Straight line – mathematical versus tendency
 Slope – the average amount of change in Y with a one unit increase in X
 Intercept – the average value of Y when X is zero.
 Correlation – the strength of the linear association between Y and X.
 Coefficient of Determination – the percent of sample variation in Y associated with variation
in X
Use of the standard deviation, the correlation, the empirical rule and the slope (called beta)
with stock price risk:
http://www.investorguide.com/igu-article-823-stock-basics-measuring-a-stocks-risk.html
Definition of coefficient of determination in finance terms:
http://www.finance-lib.com/financial-term-efficient-diversification.html
5.2 Symbols:
Term
Intercept
Slope
Correlation
Coefficient of
Determination
Population
0
1


Sample
b0
b1
r
R2
5.3 Examples
For interpreting sample slope and intercept double click the embedded Excel file below
To create new examples click on any green shaded cell
and change names and values.
Y = Sales
X= Size
X unit = thousand square foot
b0 =
b1 =
1,670,000
964000 increases by
Intercept: The estimated average value of Sales is 1670000 when Size is
zero. (Many times the intercept does not have a realistic meaning.)
Slope: For an increase of one thousand square foot in Size, the estimated
average Sales increases by 964000
or go to https://wweb.uta.edu/faculty/eakin/busa3321/EstimateInterpretation.xls
For interpreting R2 Double click below
INTERPRETATION OF R2: To create new examples
click on any green shaded cell and change names
and values.
Y = price
X= size
X unit = thousand square feet
2
R =
0.904
#NAME?
Coefficient of Determination, R2: 90.4% of the sample variation of
price is associated with variation insize
Or go to http://wweb.uta.edu/eakin/busa3321/VariationExplanation.xls
6. Excel
6.1 Center and Variation Cell Formulas.
In the examples below you have data in cells A1 to A5
Statistic
Mean
Median
Variance
Standard Deviation
Population
=Average(a1:a5)
=MEDIAN(A1:A5)
=VARP(A1:A5)
=STDEVP(A1:A5)
Sample
=Average(a1:a5)
=MEDIAN(A1:A5)
=VAR(A1:A5)
=STDEV(A1:A5)
6.2 Pivot tables can also be used for quantitative variables to calculate means, variances and
standard deviations (but not mode or median):
 Click on the quantitative variable
 Insert a Pivot table
 Click on and drag the quantitative variables down to the “ Values” section,
 Right click over the “Count of” section and select Value Field Settings
 Choose the description you want (average, variance, or standard deviation)

Note (1) if you wish more descriptions then repeatedly drag the quantitative variable to “
Values” but choose a different description each time.
(2) if you wish to find the descriptions within the levels of a qualitative variable, then after
dragging the quantitative variable to the “ Values” section, drag the qualitative variable to
the row (or column) section.
6.2 Relationship
 Click the Insert tab
 Click the XY Scatter button
On the Series tab choose X Range and Y Range
 Right click any point
 Click Add Trendline
At the bottom of resulting window – check Display Equation and Display Rsquared