Download Math 116 - Seattle Central College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Math 116
Chapter 12
Topics:



Graphical Display – histogram.
Using numbers – measures of center
and spread.
Sampling and Law of Large Numbers
Definitions



An observation is a single number. It could
be a measurement, monetary amount, etc. It
can also be considered to be one particular
outcome to some random trial.
Raw data is a collection of observations
Frequency is how many observations are in
each bin.
Graphical Display of Data
There are two types of data:
1.
Quantitative e.g. closing prices, ratios
2.
Categorical e.g gender, political affiliation
There are different ways to visually display
data. Histograms are popularly used to
display quantitative data. Pie charts are
one way to display categorical data.
Histogram of the percentages of
weekly ratios of Disney Stocks
Relative Frequencies
0.5
0.4
0.3
0.2
0.1
0
0.73 0.77 0.81 0.86 0.90 0.94 0.99 1.03 1.07 1.12 1.16
More definitions
Relative frequency is the percentage of
observations in each bar.
A frequency distribution is a chart
which shows the bins and frequencies.
What are things to look for in a
histogram:

overall pattern or shape of the distribution




major peaks
rough symmetry or clear skewness (skewed to
right or left)
estimate the center and spread of the data
look for any striking deviations from the
pattern.
Another way of describing our
data is the use of numbers:
Measures of center or central tendency.
- mean, median, mode
Measures of spread or variation.
- range, variance, standard deviation
How to find mean or average:

“average” or mean
is the sum of all
observations
divided by the
number of
observations
1
X   Xi
n
X 1  X 2  ...  Xn

n
Another measure of center or
central tendency:

Median is the middle value of the data
set.

How to find the median (M): arrange data
in ascending or descending order


if odd number of observations, then M = (n+1)/2th
if even number of observations, then M is the average
of the two middle values ;
n/2 th & n/2+1 th
Examples:

e.g.: 40, 75, 80, 80, 96, 100


mean = 78.5 median = 80
e.g. 40, 75, 80, 96, 100

mean = 78.2 median = 80
Mean versus Median:



mean is a common or more popular way to
measure center but is more sensitive to
extreme values than median.
E.g. Which of the two measures better
reflect the average price of a home?
If the distribution is symmetric, mean and
median are the same. If the distribution is
skewed, the mean is farther out in the long
tail than is the median.
Example: In 1993, the mean and
median salaries paid to major
league baseball players were
$490,000 and $1,160,000.
Which one is the mean?
Median? Explain.
Example: Measures of center
is not enough




final exam in math class section 1:
80, 80, 80, 80, 80
final exam in math class section 2:
30, 80, 90, 100, 100
Note: mean = 80 but the datasets
are different in the two sections.
(Measuring center is not enough to
describe the data; we need measures
of spread)
Measures of Spread:
Range: is the difference between the largest
and the smallest observation.
 E.g. Let us look at 3 datasets below:
A: 195, 200, 205, 215, 219, 225, 226, 235
B: 195, 210, 213, 214, 216, 218, 219, 235
C: 208, 209, 210, 210, 211, 211, 213, 248
the range for each dataset is 40 but the
datasets are different from each other.

Range:

strongly influenced by extreme values
and takes only account two
observations in the whole dataset.
Standard Deviation s (the most
common and popular): measures
how far each observation is from
the mean; the square root of the
variance.
Formula for Standard
deviation:
2
1
2
s 
(
Xi

X
)

n 1
Let us try to find the variance and
standard deviation by hand for one
time only. Use Excel for other times.
E.g.
Math test score: 30, 80, 90, 100, 100
Back to the sample
A:195, 200, 205, 215, 219, 225, 226, 235
B:195, 210, 213, 214, 216, 218, 219, 235
C:208, 209, 210, 210, 211, 211, 213, 248
Range for all three sets: 40
Mean for all three sets: 215
Sd for set A = 13.94
Sd for set B = 11.06
Sd for Set C = 13.42
Interpretations:



Variance is the average of the squares of
the deviations of each observation from the
mean.
Standard deviation is the square root of the
variance. (to have the same units as the
observation). Hence, it is a single value the
measures the dispersion of the data about
the mean. A larger standard deviation
indicates a more spread set of data points.
We use n-1 rather than n to get the average.
(to be more conservative with our estimate).
Open excel file data.xls.
In the second column, generate a new data = old
data + constant.
In the third column, generate a new data = old
data multiplied by a constant.
Find the mean, variance and standard deviation
for each column.
What do you notice?
Adding a number to each
observation:
If a number b is added (or subtracted):
The mean increases (or decreases) by b.
The variance does not changed.
The standard deviation does not
changed.
Multiplying a number to each
observation:
If each observation is multiplied by a
number a:
The mean is multiplied by a.
The variance is multiplied by a2
The standard deviation is multiplied by a.
Sampling:
Some definitions:
 Population: entire group of individuals
or objects that we want information
about.
 Sample: part of the population that we
actually analyze in order to gather
information.
Parameter: a number that describes a
population. E.g. population mean,
population standard deviation, etc.
Statistic: a number that describes a
sample. E.g sample mean, sample
standard deviation, etc.
Reasons for sampling:
Impossible to take measurements of the
population.
Samples, are quicker, easier, cheaper.
If done properly, it is enough to give us
needed information about the
population.
Random Sampling:


Simple random sampling: every one in the
population has an equal chance of being selected in
the sample.
Types of random sampling:




draw names from a hat, balls from a basket, etc.
computer software to generate random numbers, table of
random digits.
Stratified random sampling
E.g. example: seattle population: strata- economic status,
race, gender, marital,etc.
systematic random sampling – every 10th observation is
chosen.
Law of Large Numbers:
With random sampling and a large
sample, we can use the statistic of a
sample to estimate the parameter of a
population.
Volatility
It is a measurement of how much the value of
a stock fluctuates.
A common way of measuring the volatility of a
stock is to find the annualized standard
deviation of the ratios of closing prices of a
stock. (weekly, in our project).
There are other types of volatility but the one
above is what we are going to use in our
project.
To annualize the standard
deviation:
For monthly ratios, multiply the standard
deviation by square root of 12.
For weekly ratios (which we use), multiply by
square root of 52 (52 weeks in a year).
For daily ratios, multiply by square root of 252
(252 business days in a year).
Focus on the Project:
Suppose our mean weekly ratio is
1.001894. Let’s call it Rm, for the
“mean of the ratios”.
From chapter 11, our computed weekly
risk-free ratio is approximately
1.0007695. Let’s call it Rrf, for risk free
rate.
Note that Rm is too large.
Focus on the Project:
This means that on the average, each of
our weekly ratios is too large.
Specifically, each ratio is in excess of
(Rm-Rrf).
In example above, 1.001894-1.0007695
= 0.0011245
To adjust our weekly ratios to
equal the weekly risk-free rate:
We can do this by reducing each ratio by
(Rm-Rrf).
We call this normalizing each ratio.
Hence,
Rnorm  Ri  Rm  Rrf 
The
normalized
ratio
The weekly
ratio
the Ratio
excess
By normalizing our ratios:
Our new mean will match the weekly riskfree rate.
In our example above,
New Mean = Old mean – (Rm-Rrf)
= 1.001894 – 0.0011245
= 1.0007695