Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lecture Hours:
Wednesdays,
ST2002
11.00 – 12.00 (LB01)
3.00 – 4.00 (107)
Lecturenotes: http://www.cs.tcd.ie/Rozenn.Dahyot/
Rozenn Dahyot
http://www.cs.tcd.ie/Rozenn.Dahyot/
[email protected]
Room 128 Lloyd Institute
Contents
Statistical inference
Analysis of Variance
Regression
Sampling
(Some) References
An Introduction to Statistical Analysis for Business and
Industry; a problem solving approach,
Stuart, M., 2003
Regression with Graphics: a second course in applied
statistics, Hamilton, Lawrence C., 1992.
Introductory Statistics, 5th edition, P.S. Mann
Quantitative analysis
In domains such as business, it is important:
identifying the information requirements for solving the problem;
producing the necessary data;
analysing the data to determine key patterns and relationships;
quantifying the uncertainty involved;
reporting the results in readily interpretable form;
advising on interpretation and on solution implementation.
What is statistics ?
Statistics is a group of methods used to collect, analyse, present, and
interpret data and to make decisions.
Descriptive statistics: for organizing, displaying and describing data
(using table, graph, summary measure)
Inferential statistics: use sample results to help make decisions or
predictions about a population.
1
Population Vs Sample
Sampling
A population consists of all elements whose characteristics are being
studied.
Random sample: each member of the population as an equal and
known chance of being selected.
A sample: a portion of the population selected for study.
Systematic sampling: a sample drawn at regular interval from a list of
population members. As long as there is no hidden order in the list, this
is as good as random sampling.
Stratified sampling: the population is divided into subsets and random
samples are drawn from each subsets.
The set of all members of the population is called a census.
When only a portion of the population is available, we have a sample
survey.
A representative sample is a sample that captures as closely as
possible the characteristics of the population.
Sampling
Convenience sampling: used in (inexpensive) exploratory research.
The samples are selected because they are convenient.
Judgement sampling: the selection of the sample is motivated from
prior knowledge.
Quota sampling: the population is divided into subsets (like stratified
sampling) and their proportion is assessed. However their sample is not
randomly selected.
Sampling distribution
The probability distribution of a sample statistic is called its
sampling distribution.
Parameters of the sampling distribution can be computed.
Population distribution
The population distribution is the probability distribution of the population
data.
Errors
Sampling error is the difference between the value of a sample statistic
and the value of the corresponding population parameter.
Nosampling errors: errors occurring in the collection, recording, and
tabulation of data.
2
Errors: Exercise
Shape of the sampling distribution
Five students took a test and their scores are: 70, 78,
80, 80, 95. Suppose one sample of three scores (70, 80,
95) is selected from this population.
Mean and standard deviation of
The Distribution
x
µx = µ
σx =
σ
n
f (x )
(a) Find the sampling error.
is Normal:
• If f(x) is normal then the sum of independent normal r.v. is normal
(b) now suppose that one selected score has been
mistakenly set to 82 instead of 80. Find the non
sampling error.
Application of the sampling
distribution
Sampling distribution: Exercise
The mean wage per hour for all 5000 employees who work at a
large company is $17.5 and the standard deviation is $2.9.
Find the mean and standard deviation (i.e. standard error) of
for a sample size of
• If f(x) is not normal then the central limit theorem justifies the
choice of normality for the sampling distribution
x
You can calculate:
P( µ x − σ x ≤ x ≤ µ x + σ x ) = .68
P( µ x − 2σ x ≤ x ≤ µ x + 2σ x ) = .95
(a) 30
(b) 75
(c) 200
P ( µ x − 3σ x ≤ x ≤ µ x + 3σ x ) = .997
Use the standard normal distribution table with:
z=
x−µ
σx
3