Download Review of Basic Statistical Concepts

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Review of Basic Statistical Concepts
Farideh Dehkordi-Vakil
Review of Basic Statistical Concepts

Descriptive Statistics

Methods that organize and summarize data.



Numerical summary
Graphical Methods
Inferential Statistics

Generalizing from a sample to the population
from which it was selected.


Estimation
Hypothesis testing
Review of Basic Statistical Concepts

Population


The entire collection of individuals or objects
about which information is desired.
Sample

A subset of the population selected in some
prescribed manner for study.
Review of Basic Statistical Concepts

Numerical summaries

Measure of central tendencies



Mean
Median
Measure of variability



Variance, Standard deviation
Range
Quartiles
Review of Basic Statistical Concepts

The Mean X

To find the mean of a set of observations, add
their values and divide by the number of
observations. If the n observations are
x1 , x2 ,, xn , their mean is
X

x1  x2   xn
n
In a more compact notation,
x
n
x
i 1
n
i
Example: Books Page Length
A sample of n = 8 books is selected from a
library’s collection, and page length of each
one is determined, resulting in the following
data set.
X1=247, X2=312, X3=198, X4=780,
X5=175, X6=286, X7=293, X8=258
x
x
i
n

2549
 318.6
8
Review of Basic Statistical Concepts

Median M

The Median M is the midpoint of a distribution, the
number such that half of the observations are smaller
and the other half are larger. To find the median of a
distribution:
1.
2.
3.
Arrange all observations in order of size, from smallest to
largest.
If the number of observations n is odd, the median M is the
center observation in the ordered list.
If the number of observations n is even, the median M is the
mean of the two center observations in the ordered list.
Review of Basic Statistical Concepts

Quartiles Q1 and Q3

To calculate the quartiles:
1.
2.
3.
Arrange the observations in increasing order and locate the
median M in the ordered list of observations.
The first quartile Q1 is the median of the observations whose
position in the ordered list is to the left of the location of the
overall median.
The third quartile Q3 is the median of the observations whose
position in the ordered list is to the right of the location of the
overall median.
Example: Books Page Length

Median:

Order the list
175, 198, 247, 258, 286, 293, 312, 780


There are two middle numbers 258, and 286.
The median is the average of these two
numbers
Review of Basic Statistical Concepts

The Five Number Summary and Box-Plot
 The five number summary of a distribution
consists of the smallest observation, the first
quartile, the median, the third quartile, and the
largest observation, written in order from
smallest to largest. In symbols, the five number
summary is:
Minimum
Q1
M
Q3
Maximum
Review of Basic Statistical Concepts

A box-plot is a graph of the five number
Summary.




A central box spans the quartiles.
A line in the box marks the median.
Lines extend from the box out to the smallest
and largest observations.
Box-plots are most useful for side-by-side
comparison of several distributions.
Review of Basic Statistical Concepts
Review of Basic Statistical Concepts

The Variance s2

The Variance s2 of a set of observations is the
average of the squares of the deviations of the
observations from their mean. In symbols, the
variance of n observations x , x ,, x is
1
( x1  x ) 2  ( x2  x ) 2  ( xn  x ) 2
s 
n 1
2

or, more compactly,
s2 
2
(
x

x
)
 i
n 1
2
n
Review of Basic Statistical Concepts

The Standard Deviation s

The standard deviation s is the square root of
the variance s2:
s

2
(
x

x
)
 i
n 1
Computational formula for variance
( xi ) 2
( xi  x ) 2  x 

2
n
S 

n 1
n 1
2
i
Example: Book page length
2
2
(
x
)
(
2549
)

i
2
x

1070791 

i
n
8
S2 

 36945.125
n 1
7
S  36945.125  192.21
Review of Basic Statistical Concepts

Choosing a Summary

The five number summary is usually better than
the mean and standard deviation for describing
a skewed distribution or a distribution with
extreme outliers. Use x , and s only for
reasonably symmetric distributions that are free
of outliers.
Review of Basic Statistical Concepts

Introduction to Inference



The purpose of inference is to draw conclusions from data.
Conclusions take into account the natural variability in the data,
therefore formal inference relies on probability to describe chance
variation.
We will go over the two most prominent types of formal statistical
inference



Confidence Intervals for estimating the value of a population
parameter.
Tests of significance which asses the evidence for a claim.
Both types of inference are based on the sampling distribution of
statistics.
Review of Basic Statistical Concepts

Parameters and Statistics

A parameter is a number that describes the population.


A parameter is a fixed number, but in practice we do not know
its value.
A statistic is a number that describes a sample.


The value of a statistic is known when we have taken a
sample, but it can change from sample to sample.
We often use statistic to estimate an unknown parameter.
Review of Basic Statistical Concepts



Since both methods of formal inference are based
on sampling distributions, they require probability
model for the data.
The model is most secure and inference is most
reliable when the data are produced by a properly
randomized design.
When we use statistical inference we assume that
the data come from a randomly selected sample
(SRS) or a randomized experiment.
Example:Consumer attitude towards
shopping

A recent survey asked a nationwide random
sample of 2500 adults if they agreed or disagreed
with the following statement
I like buying new cloths, but shopping is often
frustrating and time consuming.


Of the respondents, 1650 said they agreed.
The proportion of the sample who agreed that
cloths shopping is often frustrating is:
1650
Pˆ 
 .66  66%
2500
Example:Consumer attitude towards
shopping



The number P̂ = .66 is a statistic.
The corresponding parameter is the
proportion (call it P) of all adult U.S.
residents who would have said “Yes” if
asked the same question.
We don’t know the value of parameter P, so
P̂ its estimate.
we use as
Review of Basic Statistical Concepts




If the marketing firm took a second random
sample of 2500 adults, the new sample would
have different people in it.
It is almost certain that there would not be exactly
1650 positive responses.
That is, the value of P̂ will vary from sample to
sample.
Random samples eliminate bias from the act of
choosing a sample, but they can still be wrong
because of the variability that results when we
choose at random.
●
●
●
●
●●
● ●
●
●
●
● ●●
●
●●
●● ●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●● ●●
●
●
● ●●
●
●
Review of Basic Statistical Concepts



The first advantage of choosing at random is that
it eliminates bias.
The second advantage is that if we take lots of
random samples of the same size from the same
population, the variation from sample to sample
will follow a predictable pattern.
All statistical inference is based on one idea: to
see how trustworthy a procedure is, ask what
would happen if we repeated it many times.
Review of Basic Statistical Concepts

Sampling Distribution of Statistics




Suppose that exactly 60% of adults find shopping for
cloths frustrating and time consuming.
That is, the truth about the population is that P = 0.6.
(parameter)
We select a SRS (Simple Random Sample) of size 100
from this population and use the sample proportion( P̂ ,
statistic) to estimate the unknown value of the
population proportion P.
What is the distribution of P̂ ?
Review of Basic Statistical Concepts

To answer this question:




Take a large number of samples of size 100
from this population.
Calculate the sample proportion P̂ for each
sample.
Make a histogram of the values of P̂ .
Examine the distribution displayed in the
histogram for shape, center, and spread, as well
as outliers or other deviations.
Review of Basic Statistical Concepts



The result of many SRS have a regular pattern.
Here we draw 1000 SRS of size 100 from the same population.
The histogram shows the distribution of the 1000 sample proportions
Review of Basic Statistical Concepts

Sampling Distribution

The sampling distribution of a statistic is the
distribution of values taken by the statistic in all
possible samples of the same size from the
same population.
Normal Distribution

These curves, called
normal curves, are




Symmetric
Single peaked
Bell shaped
Normal curves
describe normal
distributions.
Normal Density Curve



The exact density curve for a particular
normal distribution is described by giving
its mean  and its standard deviation .
The mean is located at the center of the
symmetric curve and it is the same as the
median.
The standard deviation  controls the spread
of a normal curve.
Normal Density Curve
Standard Normal Distribution

The standard Normal
distribution is the
Normal distribution
N(0, 1) with mean
 = 0 and standard
deviation  =1.
Standard Normal Distribution

If a variable x has any normal distribution
N(, ) with mean  and standard deviation
, then the standardized variable
z
x

has the standard Normal distribution.
The Standard Normal Table

Table A is a table of
the area under the
standard Normal
curve. The table entry
for each value z is the
area under the curve to
the left of z.
The Standard Normal Applet

Or you can use this applet
http:/www.stat.sc.edu~west/applets/normaldemo.html
The Standard Normal Table


What is the area under
the standard normal
curve to the right of
z = - 2.15?
Compact notation:
P  p( Z  2.15)

P = 1 - .0158 =.9842
The Standard Normal Table


What is the area under
the standard normal
curve between z = 0
and z = 2.3?
Compact notation:
P  p (0  Z  2.3)

P = .9893 - .5 =.4893
Example:Annual rate of return on stock
indexes
The annual rate of return on stock indexes (which
combine many individual stocks) is approximately
Normal. Since 1954, the Standard & Poor’s 500
stock index has had a mean yearly return of about
12%, with standard deviation of 16.5%. Take this
Normal distribution to be the distribution of yearly
returns over a long period. The market is down for
the year if the return on the index is less than zero.
In what proportion of years is the market down?
Example:Annual rate of return on stock
indexes

State the problem


Call the annual rate of return for Standard & Poor’s 500-stocks
Index x. The variable x has the N(12, 16.5) distribution. We want
the proportion of years with X < 0.
Standardize

Subtract the mean, then divide by the standard deviation, to turn x
into a standard Normal z:
x0
x  12 0  12

16.5
16.5
z  .73
Example:Annual rate of return on stock
indexes


Draw a picture to show
the standard normal curve
with the area of interest
shaded.
Use the table


The proportion of
observations less than
- 0.73 is .2327.
The market is down on an
annual basis about 23.27%
of the time.
Example:Annual rate of return on stock
indexes

What percent of years have annual return
between 12% and 50%?

State the problem
12  x  50

Standardize
12  12 x  12 50  12


16.5
16.5
16.5
0  z  2.30
Example:Annual rate of return on stock
indexes


Draw a picture.
Use table.

The area between 0
and 2.30 is the area
below 2.30 minus the
area below 0.
0.9893- .50 = .4893
Estimation

So far, we have used our sample estimates
as point estimates of parameters, for
example:
x as an estimate of 
s 2 as an estimate of  2

These estimators have properties.
Estimators

They are both unbiased estimators

The expected value of an unbiased estimator is
equal to the parameter that it is trying to
estimate.
Estimators

For Example:
n
s 2  ˆ 2 
2
(
x

x
)
 i
i 1
n 1
is an unbiased estimator of  2
n
However, the estimator 
 (x
i 1
i
 x )2
n
is not unbiased

It tends to give an answer that is a little too small.
Estimators



x is also a minimum variance estimator of
.
This means that it has the smallest
variability among all estimators of .
What if we want to do more than just
provide a point estimate?
Estimating with Confidence

Suppose we are interested in the value of
some parameter, and we want to construct a
confidence interval for it, with some desired
level of confidence
Estimating with Confidence


Suppose we can estimate this parameter from
sample data, and we know the distribution of this
estimator, then we can use this knowledge and
construct a probability statement involving both
the estimator and the true value of the parameter.
This statement is manipulated mathematically to
produce confidence intervals.
Confidence intervals


The general form of a confidence interval
is:
sample value of estimator 
(Factor)(SE of estimator)
The value of the factor will depend on the
level of confidence desired, and the
distribution of the estimator.
Estimating with Confidence

Community banks are banks with less than a billion dollars
of assets. There are approximately 7500 such banks in the
United States. In many studies of the industry these banks
are considered separately from banks that have more than a
billion dollars of assets. The latter banks are called “large
institutions.” The community bankers Council of the
American bankers Association (ABA) conducts an annual
survey of community banks. For the 110 banks that make
up the sample in a recent survey, the mean assets are X =
220 (in millions of dollars). What can we say about , the
mean assets of all community banks?
Estimating with Confidence

The sample mean X is the natural estimator of the
unknown population mean .
We know that

X is an unbiased estimator of .
 The law of large numbers says that the sample mean
must approach the population mean as the size of the
sample grows.
Therefore, the value X = 220 appears to be a



reasonable estimate of the mean assets  for all
community banks.
But, how reliable is this estimate?
Standard Error of Estimator



An estimate without an indication of its variability
is of limited value.
Questions about variation of an estimator is
answered by looking at the spread of its sampling
distribution.
According to Central Limit theorem:

If the entire population of community bank assets has
mean  and standard deviation , then in repeated
samples of size 110 the sample mean x approximately
follows the N(, 110) distribution
Standard Error of Estimator

Suppose that the true standard deviation  is equal
to the sample standard deviation s = 161.


This is not realistic, although it will give reasonably
accurate results for samples as large as 100. Later on
we will learn how to proceed when  is not known.
Therefore, by Central Limit theorem. In repeated
sampling the sample mean x is approximately
normal, centered at the unknown population mean
,with standard deviation
X 
161
 15 millions of dollars
110
Confidence Interval for the Population Mean




We use the sampling distribution of the sample
mean x to construct a level C confidence interval
for the mean  of a population.
We assume that data are a SRS of size n.


,
The sampling distribution is exactly N(
n )
when the population has the N(, ) distribution.
The Central Limit Theorem says that this same
sampling distribution is approximately correct for
large samples whenever the population mean and
standard deviation are  and .
Confidence Interval for a Population Mean

Choose a SRS of size n from a population having unknown
mean  and known standard deviation . A level C
confidence interval for  is
X  z

n
Here z* is the critical value with area C between –z* and
z* under the standard Normal curve. The quantity
z

n
is the margin of error. The interval is exact when the
population distribution is normal and is approximately
correct when n is large in other cases.
Confidence Interval for a Population Mean

Recall the community bank Example




What is a 90% confidence interval for the mean
assets of all community banks?
Estimator =
Standard error of the sample mean =
Factor =
Example: Banks’ loan –to-deposit ration

The ABA survey of community banks also asked
about the loan-to-deposit ratio (LTDR), a bank’s
total loans as a percent of its total deposits. The
mean LTDR for the 110 banks in the sample is
X  76.7 and the standard deviation is s = 12.3. This
sample is sufficiently large for us to use s as the
population  here. Find a 95% confidence interval
for the mean LTDR for community banks.
Confidence Interval for a Population Mean

What if the sample size is small and the population
standard deviation is not known?



x
Then the sampling distribution of
will be student’s
t.
The Student’s t distribution has a symmetric, bellshaped density centered at zero, and depends on a
parameter called the degrees of freedom.
The number of degrees of freedom depends upon the
sample size.
Students’ T Distribution


The density of the Student’s t distribution differs
from that of the standard normal density.
The distribution of the density in the tails and
flanks is different from the normal distribution:

The tails are higher and wider than that of a standard
normal density, indicating that the standard deviation is
larger than the standard normal, especially for small
sample sizes.0
Students’ T Distribution
N(0, 1)
T distribution
with 1 degree
of freedom
http://www.wordiq.com/definition/Image:T_distribution_1df.png
Students’ T Distribution


As the sample size becomes larger, the
degrees of freedom of the Student's t
distribution also become larger.
As the degrees of freedom become larger,
the Student's t distribution approaches the
standard normal distribution.
Students’ T Distribution
http://www.etfos.hr/~fridl/primjer6.htm
Students’ T Distribution

A level C confidence interval for  when the
sample size is small and the population standard
deviation is not known is:
X  t n*1


s
n
The t-distribution has n-1 degrees of freedom.
*
Given the confidence level. the value t n 1can be
determined from published tables for t-distribution.
Students’ T Distribution

Example:

If the sample size is 15, the critical value for a
95% confidence interval from a t-table is: 2.14.


Note the degrees of freedom is 14.
If the sample size is 25, what is the critical
value for a 90% confidence interval?
Tests of Significance





Confidence intervals are appropriate when our goal is to
estimate a population parameter.
The second type of inference is directed at assessing the
evidence provided by the data in favor of some claim about
the population.
A significance test is a formal procedure for comparing
observed data with a hypothesis whose truth we want to
assess.
The hypothesis is a statement about the parameters in a
population or model.
The results of a test are expressed in terms of a probability
that measures how well the data and the hypothesis agree.
Example: Bank’s net income

The community bank survey described in
previously also asked about net income and
reported the percent change in net income between
the first half of last year and the first half of this
year. The mean change for the 110 banks in the
sample is X  8.1% Because the sample size is
large, we are willing to use the sample standard
deviation s = 26.4% as if it were the population
standard deviation . The large sample size also
makes it reasonable to assume that X is
approximately normal.
Example: Bank’s net income



Is the 8.1% mean increase in a sample good evidence that
the net income for all banks has changed?
The sample result might happen just by chance even if the
true mean change for all banks is  = 0%.
To answer this question we ask another


Suppose that the truth about the population is that = 0% (this is
our hypothesis)
What is the probability of observing a sample mean at least as far
from zero as 8.1%?
Example: Bank’s net income

The answer is:

2  [ p ( X  8.1)]  2  [ P( Z 
8.1  0
)]  2  [ P( Z  3.22)]
26.4 110
 2  [1  .9994]  .0012


Because this probability is so small, we see that the
sample mean X  8.1 is incompatible with a population
mean of  = 0.
We conclude that the income of community banks has
changed since last year.
Example: Bank’s net income

The fact that the calculated probability is very
small leads us to conclude that the average percent
change in income is not in fact zero. Here is why.


If the true mean is  = 0, we would see a sample mean
as far away as 8.1% only six times per 10000 samples.
So there are only two possibilities:


 = 0 and we have observed something very unusual, or
 is not zero but has some other value that makes the
observed data more probable
Example: Bank’s net income


We calculated a probability taking the first
of these choices as true ( = 0 ). That
probability guides our final choice.
If the probability is very small, the data
don’t fit the first possibility and we
conclude that the mean is not in fact zero.
Tests of Significance: Formal details


The first step in a test of significance is to state a
claim that we will try to find evidence against.
Null Hypothesis H0



The statement being tested in a test of significance is
called the null hypothesis.
The test of significance is designed to assess the
strength of the evidence against the null hypothesis.
Usually the null hypothesis is a statement of “no effect”
or “no difference.” We abbreviate “null hypothesis” as
H0.
Tests of Significance: Formal details

A null hypothesis is a statement about a population,
expressed in terms of some parameter or parameters.




The null hypothesis in our bank survey example is
H0 :  = 0
It is convenient also to give a name to the statement we
hope or suspect is true instead of H0.
This is called the alternative hypothesis and is abbreviated
as Ha.
In our bank survey example the alternative hypothesis
states that the percent change in net income is not zero. We
write this as
Ha :   0
Tests of Significance: Formal details



Since Ha expresses the effect that we hope to find evidence
for we often begin with Ha and then set up H0 as the
statement that the Hoped-for effect is not present.
Stating Ha is not always straight forward.
It is not always clear whether Ha should be one-sided or
two-sided.
 The alternative Ha :   0 in the bank net income
example is two-sided.
 In any give year, income may increase or decrease, so
we include both possibilities in the alternative
hypothesis.
Tests of Significance: Formal details

Test statistics

We will learn the form of significance tests in a
number of common situations. Here are some
principles that apply to most tests and that help
in understanding the form of tests:


The test is based on a statistic that estimate the
parameter appearing in the hypotheses.
Values of the estimate far from the parameter value
specified by H0 gives evidence against H0.
Example: bank’s income

The test statistic

In our banking example The null hypothesis is
H0:  = 0, and a sample gave the X  8.1 . The test
statistic for this problem is the standardized version of
X :
z

X  0
 n
This statistic is the distance between the sample mean
and the hypothesized population mean in the standard
scale of z-scores.
z
8.1  0
 3.22
26.4 110
Example: Bank’s net income

p-values


P-value is the probability that the test statistic would
take a value as large or larger than one observed
assuming that H0 is true.
The smaller the p-value, the stronger the evidence
against H0.
p - value  P( Z  3.22)  2  (1  .9994)  .0012
Example: Bank’s net income

Conclusion




One approach is to state in advance how much evidence
against H0 we will require in order to reject it.
The level that says “this evidence is strong enough “ is
called significance level and is denoted by letter .
We compare the p-value with the significance level.
We reject H0 if the p-value is smaller than the
significance level, and say that the data are statistically
significant at level .
One sample t-test


Suppose we have a simple random sample of size
n from a Normally distributed population with
mean  and standard deviation .
The standardized sample mean, or one-sample z
statistic
x
z

0

n
has the standard Normal distribution N(0, 1).
When we substitute the standard deviation of the
mean (standard error) s /n for the /n, the
statistic does not have a Normal distribution.
The t-distribution

t-test

Suppose that a SRS of size n is drawn from a N(, )
population. Then the one sample t statistic
t
x
s n
has the t-distribution with n-1 degrees of freedom.
 There is a different t distribution for each sample size.
 A particular t distribution is specified by giving the
degrees of freedom.
Exploring Relationships between Two
Quantitative Variables

Scatter plots


Represent the relationship between two
different continuous variables measured on the
same subjects.
Each point in the plot represents the values for
one subject for the two variables.
Exploring Relationships between Two
Quantitative Variables

Example:
Data reported by the
organization for Economic
Development and
Cooperation on its 29
member nations in 1998.


Per capita gross domestic
product is on x-axis
Per capita health care
expenditures is on y-axis.
Exploring Relationships between Two
Quantitative Variables

We can describe the overall pattern of
scatter plot by



Form or shape
Direction
strength
Exploring Relationships between Two
Quantitative Variables

Form or shape


The form shown by the scatter plot is linear if
the points lie in a straight-line pattern.
Strength

The relation ship is strong if the points lie close
to a line, with little scatter.
Exploring Relationships between Two
Quantitative Variables

Direction

Positive and negative association


Two variables are positively associated when aboveaverage values of one variable tend to occur in
individuals with above average values for the other
variable, and below average values of both also tend
to occur together.
Two variable are negatively associated when above
average values for one tend to occur in subjects with
below average values of the other, and vice-versa
Exploring Relationships between Two
Quantitative Variables

Per capita health care
example




“subjects” studied are
countries
Form of relationship is
roughly linear
The direction is
positive
The relationship is
strong.
Correlation

It is often useful to have a measure of degree of
association between two variables. For example,
you may believe that sales may be affected by
expenditures on advertising, and want to measure
the degree of association between sales and
advertising.


Correlation coefficient is a numeric measure of the
direction and strength of linear relationship between
two continuous variables
The notation for sample correlation coefficient is r.
Correlation

There are several alternative ways to write the
algebraic expression for the correlation
coefficient. The following is one.
n XY   X  Y
r
n X  ( X ) n  Y  ( Y )
2



2
2
2
X and Y represent the two variables of interest. For
example advertising and sales or per capita gross
domestic product, and per capita health care
expenditure.
n is the number of subjects in the sample
The notation for population correlation coefficient is .
Correlation

Facts about correlation coefficient





r has no unit.
r > 0 indicates a positive association; r < 0
indicates a negative association
r is always between –1 and +1
Values of r near 0 imply a very weak linear
relationship
Correlation measures only the strength of linear
association.
Correlation


We could perform a hypothesis test to
determine whether the value of a sample
correlation coefficient (r) gives us reason to
believe that the population correlation () is
significantly different from zero
The hypothesis test would be
H0:  = 0
Ha:   0
Correlation

The test statistic would be
t


r 0
1 r 2
n2
The test statistic has a t-distribution with n-2 degrees of
freedom.
Reject H0 if
t  t n 2; 2 or
t  tn 2; 2
Example: Do wages rise with experience?

Many factors affect the wages of workers: the industry
they work in, their type of job, their education and their
experience, and changes in general levels of wages. We
will look at a sample of 59 married women who hold
customer service jobs in Indiana banks. The following
table gives their weekly wages at a specific point in time
also their length of service with their employer, in month.
The size of the place of work is recorded simply as “large”
(100 or more workers) or “small.” Because industry, job
type, and the time of measurement are the same for all 59
subjects, we expect to see a clear relationship between
wages and length of service.
Example: Do wages rise with experience?
Example: Do wages rise with experience?
Example: Do wages rise with experience?


The correlation between wages and length of
service for the 59 bank workers is r = 0.3535.
We expect a positive correlation between length of
service and wages in the population of all married
female bank workers. Is the sample result
convincing that this is true?
Example: Do wages rise with experience?

To compute correlation: we need:
 X  23070
Y  4159  X
2
Y
 9461302
 XY 1719430
2
 451031
Replacing these in the formula
n XY   X  Y
59(1719430)  (23070)( 4159)

n X  ( X ) n Y  ( Y )
59(9461302)  (23070) 59(451031)  (4159)

r
2

2
2
2
2
We want to test
H0:  = 0
The test statistic is
t
r n2
1 r
2
Ha:  > 0

0.3535 59  2
1  (0.3535)
2
 2.853
2
 .3535
Example: Do wages rise with experience?


Comparing t = 2.853 with critical values
from the t-table with n - 2 = 57 degrees of
freedom help us to make our decision.
Conclusion:


Since P( t > 2.853) < .005, we reject H0.
There is a positive correlation between wages
and length of service.
T-distribution applet

Tail probability for student’s t-distribution
can computed using the applet at the
following site.
http://www.acs.ucalgary.ca/~nosal/src/Applets/T-TailProb/T-TailProb.html