Download Course notes - Mathematics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 1
Where Do Data Come From?
Understanding Data:
The purpose of this class; to be able to read the newspaper and know what the heck they’re talking about!
To be able to go to the casino and know why the
always wins.
Statistics:
study of how to collect, organize, analyze, and interpret information
The advantage of statistics is that it gives a process for making decisions when faced with uncertainties without
prejudice
Statistics are used in many fields
Examples:
Medical: what are the chances a patient will go into remission with a certain cancer treatment
Education: does writing material down increase the ability to remember facts
Population:
the collection of individuals or items of interest
Examples:
 All residents of Kentucky
 All admits to hospitals in U.S.
Maybe you want to know what
Wayne state students like to do
on a Friday night. Your
population: Wayne state students
The population is defined in terms of our desire for knowledge
Census:
measurements from the entire population are used
 Every 10 years the U.S. conducts a census
o They attempt to reach every resident in the United States
o Some people are difficult to reach
Often, it is not feasible to study the entire population
Instead of the entire population, we often take measurements from a subset.
Sample:
the subset of the population on which we make measurements
We call the measurements Data
We don’t ask everyone in the
population. We ask part of the
population: or a sample.
Where do data come from?
Individuals:
The objects described by a set of data (Can be people, animals, or things)
Variable:
Any characteristic of an individual that can take different values for different individuals (we collect data on the
variables that we are interested in)
We conduct a study to collect and process data
Types of studies:
Observational study:
Observes individuals and measures variables of interest but does not attempt to influence responses
Sample survey:
Type of observational study in which a sample is selected and asked to respond to questions
Examples
 Public opinion polls
 Pre-election polls
 Teacher evaluations
Experiment:
Deliberately imposes some treatment on individuals in order to observe their responses (purpose is to study if
the treatment causes a change in the response)
Examples
 Medical study: patients are given drugs at various dosage levels to study effectiveness

Change a container from
to
There are two main parts in the science of statistics
1) Descriptive Statistics:
methods of summarizing a set of data
2) Inferential Statistics:
and see if individuals notice the difference
methods of making inference about a population based on the information in a sample
Chapter 2
Samples, Good and Bad
Bias – a prejudice in one direction
Going to the Democratic
National convention and
asking who each person
voted for would create bias.
Common ways of creating bias:
Convenience Sampling
Uses results or data that are conveniently and readily obtained (Runs risk of being severely biased!)
Asking your friends is an example of convenience sampling
Example: Voluntary response samples
 These often over represent people with strong opinions
Example: Restaurant comment cards

People that are willing to
volunteer an opinion usually
have a strong opinion and
it’s usually not good. Not
many people take time to
say “I’m Happy!”
This type of data is biased and should not be generalized to the overall population
For a sample to be useful, it needs to represent the population!
This is important since we usually want to extend the results to the population
The sample should be similar to the population in terms of demographics and other variables
One way to do this is with a random sample—a sample determined completely by
A simple random sample or
of n measurements from a population is one selected in such a
manner that every sample of size n from the population has equal probability of being selected
With a random sample

Our sample will typically be similar to our population with respect to demographic characteristics

We can control the probability of making a mistake (or probability of error)
Chapter 3
What Do Samples Tell Us?
Example: Of 100 people surveyed, 37
said they would rather take a train than
drive. Our statistic is 37% of the people
surveyed (our in this case, our sample)
Statistic:
A numerical characteristic of a sample
This value is known when we take our sample, but it will
Parameter:
p
A numerical characteristic of a
from sample to sample
population
This value is a fixed number, but when doing inferential statistics we will not know its value (unless we take a
census)
Since the population is often not available, we use statistics to
estimate parameters
Variability
Describes the spread of the values of the statistics
We can control this variability since a larger sample will force less variation
To reduce bias we should use random sampling
From the sampling variability we can calculate the margin of error
The margin of error: Gives
us a way of estimating the
parameter, given a statistic.
If the margin of error is ±
2% and the statistic is “58%
of people believe…”, then
95% of samples would have
a statistic between 56% and
60% (plus or minus 2%
from our statistic), so we
can confidently say that
“between 56% and 60% of
people believe”
We can say with 95% confidence that the amount by which a proportion obtained from a sample will differ
from the population proportion will not exceed 1
where n is the number of people in the sample.
n
Example: A sample proportion of 50%
and a sample of 1600 people:
The Margin of Error?
1
n
 1
1600
1
40
 0.025  2.5%
The Confidence Statement?
We are 95% confident that the population parameter is between 47.5% and 52.5%
How large a sample is large enough? What are the factors?
 How confident do we want to be in our conclusions?
 How much variability is in our data?
Chapter 5
Experiments, Good and Bad
“Back off, man. I'm a scientist”
To conduct a study properly, you must do the following:
 Get a representative sample
 Get a large enough sample
 Decide whether the study should be an observational study or an experiment
A response variable (dependent variable) is a variable that measures an outcome or result of a study
This just in: According to statisticians, there is a link
between having a hangover and doing poorly on an exam.
While the experiment is not conclusive, statisticians
recommend not trying to take an exam after a night of
drinking. Response variable: Exam scores, Explanatory
variable: Whether or not you have a hangover.
An explanatory variable (independent variable) is a variable that attempts to explain or causes changes in the
response variable
In an experiment, we create differences in the explanatory variable and then examine the results
In an observational study, we observe differences in the explanatory variable and then notice whether these are
related to differences in the response variable
Experiment: We noticed
that when patients were
given a placebo, they rated
their mood higher than if
they did not take the
placebo.
Observational study: We
observe that televisions
that have a larger screen
size, weigh more.
(
Not to be confused with the placebo
effect (a lurking variable for most
drug related experiments).
The individuals studied in an experiment are often called subjects
A Treatment is one or a combination of explanatory variables assigned by the experimenter
A treatment diagram is useful in determining if all combinations of the explanatory variables have been used.
Suppose we are conducting an experiment on weight loss and the explanatory variables we want to use is diet
(fat-free and Adkins) and exercise (swimming and walking).
Diet
Exercise
Swimming
Walking
Fat-free
1
3
Adkins
2
4
When conducting an experiment it is important to randomly assign individuals to one of the treatment groups
(Random assignment is equivalent to flipping a coin to decide group membership)
Placebos are given to subjects that look similar to the treatment being given in the experiment
Control Groups are used to help control lurking variables (variables that have an effect on the response
variable but are not part of the study)
A Confounding variable is a variable whose effect on the response variable cannot be separated from the
effect of an explanatory variable
Confounding variables are examples of lurking variables.
An Interaction occurs when the effect of one explanatory variable on the response variable depends on what is
happening with another explanatory variable
An observed effect so large that it would rarely occur by chance is called statistically significant
Chapter 6
Experiments in the Real World
Nonadherers – subjects who participate but do not follow the experimental treatments
In a study that tries to determine if a cough medicine helps the patient with pain, a nonadherer may be
that person that continues to make themselves hot toddies for their cold, while continuing to be a
subject of the experiment. The experimenter will not be able to tell if the medicine or the hot toddy
was benefiting pain.
Dropouts – subjects who begin the experiment but do not complete it
Blinding – (single blind) only the administrator knows if the subjects receive the treatment or placebo (double
blind) neither the subjects nor administrator know what is being given
Completely randomized experimental design – all the subjects are allocated at random among all treatment
groups
Did someone say SRS?
A block is a group of subjects that are known before the experiment to be similar in some way that is expected
to affect the response of the treatments
Example: Some of the subjects are pregnant.
In a block design, the random assignment of subjects to treatments is carried out separately within each block
Example: Do the an SRS for the pregnant women and an SRS for the rest of the subjects.
Chapter 8
Measuring
Measure- assign a number to represent a property
Instrument- something used to measure
Units- the type of values our measurements take
Validity – a measurement is valid if it is relevant or appropriate when representing a property
Story: Goober loved to measure the weight of different chocolate cakes, but Goober is a few
candles short of a birthday. Goober didn’t have a problem with his instrument, he used a
calibrated scale he bought at his favorite store—WEIGH STUFF MART. His units were not
bad either; Goober weighed his cakes in ounces. Goober’s problem was that he weighed the
cakes, by holding them and stepping on the scale. Goober thinks the average cake weighs
approximately 2112 ounces. You could say his measurements are not entirely valid.
Often a rate (percent) at which something occurs is a more valid measure than a frequency
We have 328 chocolate
shops in my city!
NYC Resident
Population: 8.3 million
Rates may
be a better
option for
comparing
WOW! I wish we had
that many… we only
have 38 chocolate
shops!
Ionia Resident
Population: 11.4 thousand
Reliability – a measurement is reliable if it is the same time after time when taken on the same individual
Variability – Consistency across measures
It might be best, when trying to get Garfield’s weight, to weigh him more than once and see that the
measurement is consistent. If Garfield weighs 122.1, 122.1, 122.1, 122.2, 122.1, 122.1 after stepping on
the scale 6 times, we see that the variance is small and that 122.1 is a reliable measurement.
Variance- A value used to determine if random error is small (so that our measurement is reliable)
Types of data
Qualitative (Categorical) variable – places responses into categories with no logical ordering
Quantitative variable – numeric values that can be ordered and mathematical operations can be performed
such as finding an average

Discrete variable – things that can be counted

Continuous variable – things that are measured
Run a race in 2:03?
Time is continuous…
nobody counted
your seconds.
Chapter 10
Graphs, Good and Bad
The distribution of a variable tells what values it takes and how often it takes these values
Pie Chart – displays the division of a total quantity
 Used only for qualitative data
 Should not include too many categories
 The number of degrees for each wedge should be proportional to the percentage
 The total percentage must add to be 100%
Two categories: pie eaten
and pie not yet eaten.
Notice percentage of pie
eaten is approximately
33% and the associated
angle is .33(360) or
approximately 120
degrees
Bar Graph – displays frequency or percentage of items in each category
 Can be used for more than one categorical variable
 The bars can be vertical or horizontal
 The bars should be of uniform width and uniformly spaced
 The length of a bar represents the quantity we wish to compare
I found this bar graph online
and thought the title was
amusing. I imagined a study
where someone asked children
“What is your favorite juice?”
and most students replied
“Yellow!”
I’m stumped…what is yellow
juice?
Line Graph (Time Plot) – shows the relationship between a quantitative variable and time
 Time is the horizontal scale (x-axis)
 The quantitative variable being measured is the vertical scale (y-axis)
This next example seems contradictory to my beliefs…but
it was also kind of amusing
A time series is a record of a variable over time.
A steady change over time is called a trend.
A seasonal component in time series means that the variable tends to be higher at certain points in time and
lower at certain points in time.
All other variation can be explained by irregular cycles and random fluctuations.
Chapter 11
Displaying Distributions with Graphs
Extreme value or Outlier – observations that are separated from the rest of the data set by some margin
Imagine a study where you asked people at
an M&M conference how many M&Ms they
ate each day and the results were: 32, 33, 45,
67, 28, 32, 40, 0, 32, 41, 879, 33 We see
that 0 and 879 are outliers. Who goes to an
M&M conference without eating M&Ms?
I’d be a little nervous about that person. I’d
also be a little nervous about the person who
consumes 879 M&Ms!
Shape – the pattern displayed when the graph is created
Stem and Leaf – separates data entries into “leading digits” or “stems” and “trailing digits” or “leaves”.
Features:
 A device that organizes and groups data but allows us to recover the original data if desired
 Good for spotting extreme values and identifying shape
14 male weights in pounds
139,153,179,201,163,168,157,170,172,165,145,155,161,151
stem – tens of pounds
leaf – one pounds
13
14
15
16
17
18
19
20
9
5
1357
1358
029
1
A stem and leaf plot for
inches of snow per day for the
first week of May…?
Frequency distribution – a summary table in which the data are arranged into conveniently established class
groupings.
 should have between 5 and 15 classes
 each class grouping should be of equal width
 overlapping the classes must be avoided
 useful when dealing with very large data sets
 through the grouping process the original data is lost
class midpoint – the point halfway between the boundaries of each class.
Weight
130 but less than 140
140 but less than 150
150 but less than 160
160 but less than 170
170 but less than 180
180 but less than 190
190 but less than 200
200 but less than 210
Total
Frequency
1
1
4
4
3
0
0
1
14
Histogram – a picture of a frequency distribution
Shapes of histograms
 Symmetrical – both sides are the same when the graph is folded vertically
 Uniform – every class has equal frequency
 Skewed Left or Skewed Right – one tail is stretched longer than the other. The direction of the skewness is
on the side of the longer tail.
 Bimodal – the two classes with largest frequencies are separated by at least one class
Chapter 12
Describing Distributions with Numbers
Measure of Central Tendency
Description of Average (Typical Value)
sample mean:
(simple average)
where n is the sample size and
are the observations.
The sum of the data values divided by the sample size.
Select 4 students and ask “how many brothers and sisters do you have?”
Answers: 2,3,1,3
Notice if the fourth person had
responded that they had 10 brothers
instead of 3; the mean would be 4
instead. This shows that the mean is
influenced by extreme values.
Here is something that is not influenced by extreme values:
sample median:
(middle score)
 rank data from smallest to largest
 if n is odd, median is the middle score
 if n is even, median is the average of two middle scores
(number of siblings)
observations: 2,3,1,3
1,2,3,3
Observations (with the fourth responder saying 10 instead of 3): 2,3,1, 10
1,2,3,10
sample mode: most frequent score
Observations: 2,3,1,3
Mode = 3



What if there is no mode?!
If no number occurs more than once, we say there is no mode, but if two numbers
tie for the number of occurrences, then each observation gets the title of mode.
does not always exist/can be more than one
Unstable (If we start rounding, the mode can change drastically)
can be used with qualitative data
Measures of Dispersion (Variability)
Distribution #1
1
2 55
3 55555
4 55
5
Distribution #2
1 5
2 55
3 555
4 55
5 5
The mean, median and mode are all 35 in both distributions above, but there is a big
difference between the two distributions!
How we measure the differences:
sample range:
(highest observation) – (lowest observation)
Years of experience of faculty
1, 30, 22, 10, 5
sample range = 30-1 = 29 years
This is easy to compute and totally sensitive to extreme scores.
Sample Variance: measures the average squared distances from the mean.
Sample Standard Deviation: The square root of the sample variance and measures the average distances from
the mean.
Standard deviation is incredibly important to class and we
will discuss the formulas and how to compute in class.
Measures of Position
Quartiles - divide the data into four equally sized parts


First Quartile, :
25% of the data lies below
75% of the data lies above
Second Quartile (median),
50% of the data lies below
50% of data lies above
Third Quartile, :
75% of the data lies below
25% of the data lies above
:
Procedure to Compute Quartiles
1) Order the data from smallest to largest
2) Find the median. This is the 2nd Quartile
3)
is the median of the lower half of the data
4)
is the median of the upper half of the data
In the event that there is an odd number of observations,
you will take out the median before computing the first
and third quartiles. In the event that there is an even
number of observations, you will leave all the observations
in, when computing the first and third quartiles.
5 number summary:
Min, , median, , Max
Interquartile range (IQR) =
 Range of Middle 50% of the data
Students
0 0013555678
1 0
2
3
4
5
6
7
Students
Min = 0
=1
=5
=7
Max = 10
Faculty
0
1
055
2
04588
3
1
4
3
5
6
7
3
Faculty
Min = 10
= 15
= 25
= 31
Max = 73
Boxplots:
Procedure
1) Draw a scale to include the lowest and highest data value (USE EVEN INCREMENTS!)
2) Draw a box from
to
3) Draw a solid line through the box at the median
4) Draw solid lines, called whiskers, from
to the lowest value and from
to the highest value
Chapter 13
Normal Distributions
As the class widths for a histogram become smaller and smaller, the top of the histogram becomes more curvelike.
We set up these curves so that the area under the curve represents the proportion of observations
This is known as the density curve and is the most common way of representing a population
Another way to determine shape is by comparing the mean and median
The median of a density curve is the point that divides the area in half
The mean of a density curve is the balance point of the density function
Because of this if the mean and median are equal then the distribution is symmetric
If the mean is greater than the median then the curve is skewed right
If the mean is less than the median then the curve is skewed left
If the curve follows a normal distribution (Gaussian distribution) then it will be a bell-shaped curve
Density curves are useful in determining what proportion or percentage of the population falls within an
interval
The area under the curve represents this proportion and the total area is 1
The normal distribution is characterized by  or
(population mean) and  (population standard deviation)
A normal curve with a   0 and   1 is called the standard normal curve
A percentile represents the position of your measurement in comparison with everyone else’s and
gives the percentage of the population that falls below you.
To find a percentile we will use standardized scores (z-scores), denoted z
Example
If your height is 70 inches, and the heights of the class are normally distributed with   65 and   5 , then
you have a z  1
That is your height is 1 standard deviation above the mean z 
x

z-scores allow us to transform any normal curve into a standard normal curve
Empirical Rule
The z-score for an
observation is just the
number of standard
deviations, the
observation is above
the mean
68-95-99.7
Approximately 68% of the data fall within 1 standard deviation of the mean x  s, x  s 
Approximately 95% of the data fall within 2 standard deviations of the mean x  2s, x  2s 
Approximately 99.7% of the data fall within 3 standard deviations of the mean x  3s, x  3s 
For a normal distribution, the empirical rule gives exact percentages
If an observation is not 1, 2, or 3 standard deviations
from the mean, we cannot use the 68-95-99.7 rule. To
determine percentages, we use a z-score table, like the
one on the next page.
Chapter 14
Describing Relationships:
Scatterplots and Correlation
Scatterplot or Scatter diagram – displays the relationship between two quantitative variables
 x-axis
independent variable
explanatory variable
 y-axis
dependent variable
response variable
Example:
Age (x)
8
10
15
16
18
19
Height (y)
48
53
63
65
67
67
Correlation - a measure of association that tests whether a relationship exists between two variables
In general we will be looking for linear correlations i.e.
how closely the data follows a line when plotted.
The correlation coefficient (denoted r) is a value which measures correlation and indicates both the strength of
the association and its direction.
Positive r suggests that as the x values increase, so do the y values. It also happens that as the x values decrease,
so do the y values.
Negative r suggests the opposite; when the x values increase, the y values decrease and when the x values
decrease, the y values increase.
We will always have that  1  r  1

when r is close to 1
(data is close to a straight line with positive slope)

when r is close to -1
(data is close to a straight line with negative slope)

r 0
(no linear relationship)
The stronger the linear relationship, the closer r is to -1 or 1.
Generally, we will say there is a strong relationship if r  .75
Note that it is never possible to prove causality just based on the relationship between two variables
There is a strong statistical correlation over months of the year between ice cream consumption and the number
of assaults in the U.S.
Does this mean ice cream manufacturers are responsible for crime?
No! The correlation occurs statistically because the hot temperatures of summer increase both ice cream
consumption and assaults
Thus, correlation does NOT imply causation
Other factors besides cause and effect can create an observed correlation
To establish whether two variables are causally related you must establish:
1)
Time order - The cause must have occurred before the effect
2)
Co-variation (statistical association) – The correlation coefficient must show a strong relationship
between the dependent and independent variable
3)
Rationale - There must be a logical and compelling explanation for why these two variables are related
4)
Non-spuriousness - It must be established that the independent variable X, and only X, was the cause of
changes in the dependent variable Y; rival explanations must be ruled out
This type of research is very complex and the researcher can never be completely certain that there are not
other factors influencing the causal relationship
To help identify a relationship as cause and effect a study is often performed many times
The study should yield the same results every time it is conducted
(if this occurs it helps rule out rival explanations)
Chapter 15
Describing Relationships:
Regression, Prediction, and Causation
Linear Regression
Purpose of linear regression: To predict the value of a difficult to measure variable, y (response variable), based
on an easy to measure variable, x (explanatory variable)
Example
Predict the finishing time of the men’s 100 meter dash in 2032
We do this by using a line that fits the data, called a regression line.
Lines have equations of the form:
Where b is the y-intercept and m is the slope
In order to use linear regression, make sure the model is reasonable (the scatter plot and r should indicate a
strong correlation)
We need a line that is the “best” fit for our data
To accomplish this we will use the method of least-squares
To find the least squares regression line, we are essentially looking to minimize the area of the squares created
from a possible regression line and our observations:
While we do not do this to compute, we find that the least squares regression line can be found by seeing that
and
We will do an example in class (be sure to include in your notes)
Interpolation – Predicting y values for x values that are within the range of the scatter plot (This is what
regression should be used for)
Extrapolation – Predicting y values for x values beyond the range of the observations (In general, this should not
be done as it can pose a problem)
It is possible to create a scatter plot where the explanatory variable is age and the response variable is
height. When comparing a child’s height to her age, it may seem as if the data has a strong linear
correlation. By using a regression line to extrapolate, we might find that at the age of 28 we would
expect a height of 7 feet. This problem is present because our growth over time is not typically
linear.
Chapter 17 & 18
Thinking about Chance & Probability
When thinking about chance, we consider outcomes or the possible things that can happen.
When rolling 2 dice an example of something that can
happen is rolling “snake eyes” (both land on one).
Rolling 2 ones is an example of an outcome when rolling
2 dice.
We call the collection of outcomes we care about an event.
When playing cards we might like to
consider the chance of getting a royal
flush. Our event would be getting a
royal flush, which has 4 outcomes
(one for each suit)
The probability of an event A, denoted P(A) , is the expected proportion of occurrences of A if the
experiment were performed a large number of times
In general we can compute probability if the chance of each outcome is equal. In this case:
The set of all possible outcomes is called the sample space
Examples:
Roll a d20 (20 sided die)
Flip a coin
Sample space: {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}
sample space  {H, T}
Usually it will be hard to list the entire sample space. For instance, listing out all of the possible card hands a
very long time. We therefore resort to counting principles.
Counting principals
How many ways are there to arrange the letters in the word SUNDAY?
Here we have 6 letters which cannot have repeats. We have 6 choices for the first letter, this leaves us with 5
choices for the second letter, 4 for the third…and finally one choice for the sixth letter, giving us 6x5x4x3x2x1
= 720 ways to arrange the letters.
If a burglar system has a 3 digit code and each digit is a number between 0 and 9, then how many possible
codes are there?
Here we have 10 different numbers and we are okay with repeats so there are 10 choices for each of the 3 digits
giving us
=1000 different combinations
If we are interested in looking at a series of operations then a device called a tree diagram is useful for
determining the sample space
Flip a Penny, Nickel, and a Dime:
In this tree we see that there are 8 possible outcomes.
This gives us the ability to compute probabilities:
If A is the event of getting all three tails, what is P(A) ?
If B is the event of getting exactly two tails, what is P(B) ?
If C is the event of getting exactly one tail, what is P(C ) ?
If D is the event of getting no tails, what is P(D) ?
Thinking about the above counting techniques we see that there should be some formulas at our disposal.
The complement of an event A, denoted
, is all outcomes not in A
The complement rule
P( A )  1  P( A)
The addition rule
P( A or B)  P( A)  P( B)  P( A and B)
If we are interested in knowing if event A occurred given that we know event B occurred, this is known as
conditional probability, denoted A|B or A given B
The conditional probability rule
P( A and B)
P( A | B) 
P( B)
The multiplication rule
P( A and B)  P( A) P( B | A)
We say the events A and B are mutually exclusive or disjoint if they cannot occur together.
P( A and B)  0
Two events are said to be independent if the occurrence (or nonoccurrence) of one does not effect the
probability of occurrence of the other.
P( A)  P( A | B)
Events that are not independent are dependent.
P( A)  P( A | B)
Example
Draw two cards without replacement
A  {first card is an ace}
B  {second card is an ace}
A and B are dependent
P( A and B)  P( A) P( B | A)
 (4
)( 3 )  .004525
52 51
Suppose we return the first card thoroughly shuffle before we draw the second
A and B are independent
P( A and B)  P( A) P( B | A)
 (4
)( 4 )  .00591
52 52
We can also use a density curve when our outcomes are not discrete, but continuous.
Example: determining the probability that a sample statistic of with a sample size of 100 is
within 10% of the parameter.
AT THIS TIME YOU MIGHT BE THINKING “MARGIN OF ERROR, MARGIN OF
ERROR, MARGIN OF ERROR”
It turns out that in this case there is a 95% chance since the margin of error with a sample size
of 100 is 10%.
This type of probability model is continuous. We compute probabilities using areas under the
density curve. We also require that the total area under the density curve is 1.
Chapter 20
The House Edge: Expected Values
Mean and Standard Deviation of a Discrete Probability Distribution
One of the most asked
questions for probability:
“If the probability that I win
$10 is ¼, $20 is ¼ and $0 is ½,
what will I win on average?”
The mean (denoted ) of a probability model is outcome one would expect on average.
The equation for mean is not much different than the old equation for mean. If
you win $10 ¼ of the time, $20 ¼ of the time and $0 ½ of the time, you would
expect out of every 4 plays to get $10, $20, $0 and $0, so your mean would be
but this is equal to
The standard deviation (denoted ) of a probability model is the “weighted average” distance each outcome is
from the mean (where the weighting is given by the probabilities)
Chapter 21
What is a Confidence Interval?
In Chapter 3 we talked about 95% confidence statements
Reminder:
A statistic from a sample of size n has a margin of error of approximately
This is because the statistics of sample sizes n are normally distributed with a mean equal to
the parameter and standard deviation of approximately
While, what we did in Chapter 3 was great, something worth noting is that the standard deviation of the
distribution of statistics should also be based on the parameter…you wouldn’t expect to allow a confidence
interval of something like:
I’m 95% confident that between 90 and
110 percent of people like brownies.
This comes about because we could have a
parameter of 100%, but our sample size is
small…I mean, who doesn’t like brownies?
Let the new standard deviation be defined to be:
where p is the statistic from a sample size of n individuals.
Confidence intervals are
about to blow your mind!
.
Sampling Distribution of the Sample Mean
Suppose you don’t just want to know what percentage of the population
has brown eyes… or other characteristic variables like that. Suppose you
want to know the average IQ of the American population; you want to know
the mean of some quantitative variable.
We only have the tools thus far, to approximate a percentage of the population that has some characteristic. If
we wanted to answer a quantitative question, we could only say things like “54% of the population has more
than 2 cats”, when we would like to say things like “the average person has 1.7 cats”
You kind of have to feel bad for the 7/10 of a cat running around
To approximate the mean we notice the following:
If we have a sample of size n, we can compute the mean value of the observations . If we consider the
collection of mean values from all samples of size n (just like we did when we looked at statistics), we see that
the values that takes, are normally distributed with a mean, which we will call and a standard deviation of
, where is the actual standard deviation of all observations from the population.
What can we do with this?
Suppose we have a sample of size n…
We can compute the mean, which we will call
We can compute the standard deviation, which we will call s
From this we can use the fact that the sample means are normally distributed and approximate the mean for the
population to be , and approximate the standard deviation of this distribution to be , which gives us that we
can make a 95% confidence interval:
I am 95% confident that the population mean is between
and
Chapter 22
What is a Test of Significance?
Statistical hypotheses – statements about population parameter
Suppose you think people can’t tell the difference between
sugar and artificial sweeteners.
Your hypothesis: 50% of people would say they like sugar
better in a blind taste test
Notice: Hypotheses
are not necessarily
correct!
In statistics, we test one hypothesis against another:
The hypothesis that we want to prove is called the alternative hypothesis, H a
We might want to show that sugar is actually preferred…or that our parameter is greater than 50%
The hypothesis that is contradictory to
To determine if
is called the null hypothesis, H 0
is believable we conduct a study with a sample and either
Reject H 0 and believe H a
Or
Fail to reject H 0 because there was not sufficient evidence to reject it
Example:
Suppose it is believed by others that there is no difference between sugar and artificial sweeteners, but you
believe that sugar is better liked.
Null hypothesis is that half of the population would like sugar better
Alternative hypothesis would be that more than half of the population would like sugar better
Now suppose we ask a sample of 100 people and see that 63% of them like sugar better. This certainly suggests
that the null hypothesis is wrong, but could it have been coincidental?
Notice that if
then we would expect that the sample statistics of samples of size 100 would be normally
distributed with mean of .5 and we would expect a standard deviation of
so 68% of
samples should have statistics between 45% and 55%, 95% of samples should have statistics between 40% and
60%. Because of this we can see how unlikely it is becoming that we got one of the few samples with a statistic
as large as 63%
In fact, we can determine the probability that such a thing would occur: notice the z-score of 63% is
And the associated percentile is 99.53% so the likelihood that a statistic as high as 64% is 100-99.53=.47%
The likelihood that the statistic take place if the null hypothesis is true is referred to as the
P-Value
Because it is highly unlikely that we would get a statistic of 63% if the null hypothesis were true, we reject the
null hypothesis. Since the statistics satisfied the alternative hypothesis, we also use this as evidence that the
alternative hypothesis is in fact true.
If we want to be sure that the null hypothesis is untrue, we can adjust the P-Value we are looking for.
The level that the P-Value must be under is referred to as the level of significance. For instance, if
we were testing to a level of significance of .1%, we would not reject
in the above example, but
we would if the level of significance was 1%