Download statistic - Website Staff UI

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
 
STATISTICS
© aSup
 
What is STATISTICS?
 A set of mathematical procedure for organizing,
summarizing, and interpreting information
(Gravetter, 2004)
 A branch of mathematics which specializes in
enumeration data and their relation to metric data
(Guilford, 1978)
 Any numerical summary measure based on data
from a sample; contrasts with a parameter which
is based on data from a population (Fortune,
1999)
 etc.
© aSup
 
Two General Purpose of Statistics
(Gravetter, 2007)
1. Statistic are used to organize and summarize
the information so that the researcher can see
what happened in the research study and can
communicate the result to others
2. Statistics help the researcher to answer the
general question that initiated the research by
determining exactly what conclusions are
justified base on the result that were obtained
© aSup
 
DESCRIPTIVE STATISTICS
The purpose of descriptive statistics is to
organize and to summarize observations so that
they are easier to comprehend
© aSup
 
INFERENTIAL STATISTICS
The purpose of inferential statistics is to draw an
inference about condition that exist in the
population (the complete set of observation)
from study of a sample (a subset) drawn from
population
© aSup
 
SOME TIPS ON STUDYING STATISTICS
 Is statistics a hard subject?
IT IS and IT ISN’T
 In general, learning how-to-do-it requires
attention, care, and arithmetic accuracy, but it
is not particularly difficult.
LEARNING THE ‘WHY’ OF THINGS MAY
BE HARDER
© aSup
 
SOME TIPS ON STUDYING STATISTICS
 Some parts will go faster, but others will
require concentration and several readings
 Work enough of questions and problems to
feel comfortable
 What you learn in earlier stages becomes the
foundation for what follows
 Try always to relate the statistical tools to real
problems
© aSup
 
POPULATIONS and SAMPLES
THE POPULATION
is the set of all the individuals of
interest in particular study
The result from the
sample are generalized
from the population
The sample is selected
from the population
THE SAMPLE
is a set of individuals selected from a population, usually
intended to represent the population in a research study
© aSup
 
PARAMETER and STATISTIC
 A parameter is a value, usually a numerical
value, that describes a population.
A parameter may be obtained from a single
measurement, or it may be derived from a set
of measurements from the population
 A statistic is a value, usually a numerical
value, that describes a sample.
A statistic may be obtained from a single
measurement, or it may be derived from a set
of measurement from sample
© aSup
 
SAMPLING ERROR
 It usually not possible to measure everyone in the
population
 A sample is selected to represent the population. By
analyzing the result from the sample, we hope to
make general statement about the population
 Although samples are generally representative of
their population, a sample is not expected to give a
perfectly accurate picture of the whole population
 There usually is some discrepancy between sample
statistic and the corresponding population parameter
called sampling error
© aSup
 
TWO KINDS OF NUMERICAL DATA
Generally fall into two major categories:
1. Counted  frequencies  enumeration data
2. Measured  metric or scale values 
measurement or metric data
Statistical procedures deal with both kinds of data
© aSup
 
DATUM and DATA
 The measurement or observation obtain for
each individual is called a datum or, more
commonly a score or raw score
 The complete set of score or measurement is
called the data set or simply the data
 After data are obtained, statistical methods are
used to organize and interpret the data
© aSup
 
VARIABLE
 A variable is a characteristic or condition that
changes or has different values for different
individual
 A constant is a characteristic or condition that
does not vary but is the same for every
individual
 A research study comparing vocabulary skills
for 12-year-old boys
© aSup
 
QUALITATIVE and QUANTITATIVE
Categories
 Qualitative: the classes of objects are different
in kind.
There is no reason for saying that one is greater
or less, higher or lower, better or worse than
another.
 Quantitative: the groups can be ordered
according to quantity or amount
It may be the cases vary continuously along a
continuum which we recognized.
© aSup
 
DISCRETE and CONTINUOUS Variables
 A discrete variable. No values can exist
between two neighboring categories.
 A continuous variable is divisible into an
infinite number or fractional parts
○ It should be very rare to obtain identical
measurements for two different individual
○ Each measurement category is actually an interval
that must be define by boundaries called real limits
© aSup
 
CONTINUOUS Variables
 Most interval-scale measurement are taken to
the nearest unit (foot, inch, cm, mm)
depending upon the fineness of the measuring
instrument and the accuracy we demand for
the purposes at hand.
 And so it is with most psychological and
educational measurement. A score of 48 means
from 47.5 to 48.5
 We assume that a score is never a point on the
scale, but occupies an interval from a half unit
below to a half unit above the given number.
© aSup
 
FREQUENCIES, PERCENTAGES,
PROPORTIONS, and RATIOS
 Frequency defined as the number of objects or event
in category.
 Percentages (P) defined as the number of objects or
event in category divided by 100.
 Proportions (p). Whereas with percentage the base
100, with proportions the base or total is 1.0
 Ratio is a fraction. The ratio of a to b is the fraction
a/b.
A proportion is a special ratio, the ratio of a part to a
total.
© aSup
 
MEASUREMENTS and SCALES (Stevens, 1946)
Ratio
Interval
Ordinal
Nominal
© aSup
 
NOMINAL Scale
 Some variables are qualitative in their nature rather
than quantitative. For example, the two categories
of biological sex are male and female. Eye color,
types of hair, and party of political affiliation are
other examples of qualitative or categorical
variables.
 The most limited type of measurement is the
distinction of classes or categories (classification).
 Each group can be assigned a number to act as
distinguishing label, thus taking advantage of the
property of identity.
 Statistically, we may count the number of cases in
each class, which give us frequencies.
© aSup
 
ORDINAL Scale
 Corresponds to was earlier called
“quantitative classification”. The classes are
ordered on some continuum, and it can be
said that one class is higher than another on
some defined variable.
 All we have is information about serial
arrangement.
 We are not liberty to operate with these
numbers by way of addition or subtraction,
and so on.
© aSup
 
INTERVAL Scale
 This scale has all the properties of ordinal
scale, but with further refinement that a given
interval (distance) between scores has the
same meaning anywhere on the scale. Equality
of unit is the requirement for an interval
scales.
 Examples of this type of scale are degrees of
temperature. A 100 in a reading on the Celsius
scale represents the same changes in heat
when going from 150 to 250 as when going
from 400 to 500
© aSup
 
INTERVAL Scale
 The top of this illustration shows three
temperatures in degree Celsius: 00, 500, 1000. It
is tempting to think of 1000C as twice as hot as
500.
 The value of zero on interval scale is simply an
arbitrary reference point (the freezing point of
water) and does not imply an absence of heat.
 Therefore, it is not meaningful to assert that a
temperature of 1000C is twice as hot as one of
500C or that a rise from 400C to 480C is a 20%
increase
© aSup
 
INTERVAL Scale
 Some scales in behavioral science are
measurement of physical variables, such as
temperature, time, or pressure.
 However, one must ask whether the
variation in the psychological phenomenon
is being measured indirectly is being scaled
with equal units.
 Most measurements in the behavioral
sciences cannot posses the advantages of
physical scales.
© aSup
 
RATIO Scale
 One thing is certain: Scales …the kinds just
mentioned HAVE ZERO POINT.
© aSup
 
Confucius, 451 B.C
What I hear, I forget
What I see, I remember
What I do, I understand
© aSup
 
Jenis-jenis statistika deskriptif yang
telah dipelajari
 Distribusi frekuensi: Menunjukkan
seluruh skor yang ada dan frekuensi
kemunculannya (ungrouped & grouped
data)
 Kurva Normal: Distribusi Normal dan
Probabilitas, Proporsi, dan z-scores
© aSup
 
Jenis-jenis statistika deskriptif yang
telah dipelajari
Tendensi sentral: To find the single
score that is most typical or most
representative of the entire group
(Gravetter & Wallnau, 2007)
 Mean, Median, Mode
© aSup
 
Jenis-jenis statistika deskriptif yang
telah dipelajari
 Variabilitas: Measures the dispersion
among the scores (or how spread out
the data are) around the central
measure (Furlong, 2000)
 Provides a quantitative measure of the
degree to which scores in a distribution
are spread out or clustered together
(Gravetter & Wallnau, 2007)
© aSup
 
Variabilitas
Menggambarkan:
○ variasi
○ jangkauan
○heterogenitas-homogenitas
dari pengukuran suatu kelompok
© aSup
Beberapa Pengukuran
Variabilitas
 
 Jangkauan /range (JT)
 Interquartile Range (Q) dan Semiinterquartile range
 Varians (S2)
 Simpang Baku/Standard Deviation (S)
© aSup
 
PERCENTILES and PERCENTILE RANKS
 The percentile system is widely used in educational
measurement to report the standing of an individual
relative performance of known group. It is based on
cumulative percentage distribution.
 A percentile is a point on the measurement scale
below which specified percentage of the cases in the
distribution falls
 The rank or percentile rank of a particular score is
defined as the percentage of individuals in the
distribution with scores at or below the particular
value
 When a score is identified by its percentile rank, the
score called percentile
© aSup
31
 
 Suppose, for example that A have a score of
X=78 on an exam and we know exactly 60%
of the class had score of 78 or lower….…
 Then A score X=78 has a percentile of 60%,
and A score would be called the 60th
percentile
Percentile Rank refers to a percentage
Percentile refers to a score
© aSup
32
 
Initstereng!!
Aoccdrnig to a rscheearch at an Elingsh
uinervtisy, it deosn't mttaer in waht oredr
the ltteers in a wrod are, the olny
iprmoatnt tihng is that frist and lsat ltteer
is at the rghit pclae.
The rset can be a toatl mses and you can
sitll raed it wouthit porbelm. Tihs is
bcuseae we do not raed ervey lteter by it
slef but the wrod as a wlohe.
© aSup
33
 
PROBABILITY
© aSup
 
INTRODUCTION TO PROBABILITY
We introduce the idea that research
studies begin with a general question
about an entire population, but actual
research is conducted using a sample
POPULATION
Inferential Statistics
Probability
© aSup
SAMPLE
 
THE ROLE OF PROBABILITY IN
INFERENTIAL STATISTICS
 Probability is used to predict what kind of
samples are likely to obtained from a
population
 Thus, probability establishes a connection
between samples and populations
 Inferential statistics rely on this connection
when they use sample data as the basis for
making conclusion about population
© aSup
 
PROBABILITY DEFINITION
The probability is defined as a fraction or a
proportion of all the possible outcome divide
by total number of possible outcomes
Probability of A
© aSup
=
Number of outcome
classified as A
Total number of
possible outcomes
 
EXAMPLE
 If you are selecting a card from a
complete deck, there is 52 possible
outcomes
○ The probability of selecting the king of
heart?
○ The probability of selecting an ace?
○ The probability of selecting red spade?
 Tossing dice(s), coin(s) etc.
© aSup
 
PROBABILITY and
THE BINOMIAL DISTRIBUTION
When a variable is measured on a scale
consisting of exactly two categories, the
resulting data are called binomial (two
names), referring to the two categories on
the measurement
© aSup
 
PROBABILITY and
THE BINOMIAL DISTRIBUTION
 In binomial situations, the researcher often
knows the probabilities associated with
each of the two categories
 With a balanced coin, for example
p (head) = p (tails) = ½
© aSup
 
PROBABILITY and
THE BINOMIAL DISTRIBUTION
 The question of interest is the number of
times each category occurs in a series of
trials or in a sample individual.
 For example:
○ What is the probability of obtaining 15 head
in 20 tosses of a balanced coin?
○ What is the probability of obtaining more
than 40 introverts in a sampling of 50
college freshmen
© aSup
 
TOSSING COIN
 Number of heads obtained in 2 tosses a coin
○ p = p (heads) = ½
○ p = p (tails) = ½
 We are looking at a sample of n = 2 tosses, and
the variable of interest is X = the number of
head
The binomial
distribution showing
the probability for the
number of heads in 2
coin tosses
© aSup
0
1
2
Number of heads in 2 coin tosses
 
TOSSING COIN
Number of heads in 3
coin tosses
Number of heads in 4 coin tosses
© aSup
 
The BINOMIAL EQUATION
(p +
© aSup
n
q)
 
LEARNING CHECK
 In an examination of 5 true-false
problems, what is the probability to
answer correct at least 4 items?
 In an examination of 5 multiple choices
problems with 4 options, what is the
probability to answer correct at least 2
items?
© aSup
 
PROBABILITY and NORMAL DISTRIBUTION
σ
μ
In simpler terms, the normal distribution is
symmetrical with a single mode in the middle.
The frequency tapers off as you move farther
from the middle in either direction
© aSup
 
PROBABILITY and NORMAL DISTRIBUTION
μ
X
Proportion below the curve  B, C, and D area
© aSup
 
B and C area
X
© aSup
 
B and C area
X
© aSup
 
B, C, and D area
μ
X
B+C=1
C+D=½B–D=½
© aSup
 
B, C, and D area
X
μ
B+C=1
C+D=½B–D=½
© aSup
 
The NORMAL DISTRIBUTION following
a z-SCORE transformation
34.13%
13.59%
2.28%
-2z
-1z
0
μ
© aSup
+1z
+2z
 
34.13%
σ=7
13.59%
2.28%
-2z
-1z
0
μ = 166
+1z
+2z
Assume that the population of Indonesian adult
height forms a normal shaped with a mean of μ = 166
cm and σ = 7 cm
• p (X) > 180?
• p (X) < 159?
© aSup
 
34.13%
σ=7
13.59%
2.28%
-2z
-1z
0
+1z
μ = 166
+2z
Assume that the population of Indonesian adult
height forms a normal shaped with a mean of μ = 166
cm and σ = 7 cm
• Separates the highest 10%?
• Separates the extreme 10% in the tail?
© aSup
 
34.13%
σ=7
13.59%
2.28%
-2z
-1z
0
+1z
μ = 166
+2z
Assume that the population of Indonesian adult
height forms a normal shaped with a mean of μ = 166
cm and σ = 7 cm
• p (X) 160 - 170?
• p (X) 170 - 175?
© aSup
 
Chapter 8
INTRODUCTION TO
HYPOTHESIS TESTING
© aSup
56
 
The Logic of Hypothesis Testing
 It usually is impossible or impractical for a
researcher to observe every individual in a
population
 Therefore, researchers usually collect data
from a sample and then use the sample data to
answer question about the population
 Hypothesis testing is statistical method that
uses sample data to evaluate a hypothesis
about the population
© aSup
57
 
The Hypothesis Testing Procedure
1. State a hypothesis about population, usually the
hypothesis concerns the value of a population
parameter
2. Before we select a sample, we use hypothesis to
predict the characteristics that the sample have.
The sample should be similar to the population
3. We obtain a sample from the population
(sampling)
4. We compare the obtain sample data with the
prediction that was made from the hypothesis
© aSup
58
 
PROCESS OF HYPOTHESIS TESTING
 It assumed that the parameter μ is known for the
population before treatment
 The purpose of the experiment is to determine
whether or not the treatment has an effect on the
population mean
Known population
before treatment
Unknown population
after treatment
TREATMENT
μ = 30
© aSup
μ=?
59
 
EXAMPLE
 It is known from national health statistics that
the mean weight for 2-year-old children is μ =
26 pounds and σ = 4 pounds
 The researcher’s plan is to obtain a sample of n
= 16 newborn infants and give their parents
detailed instruction for giving their children
increased handling and stimulation
 NOTICE that the population after treatment is
unknown
© aSup
60
 
STEP-1: State the Hypothesis
 H0 : μ = 26 (even with extra handling, the
mean at 2 years is still 26 pounds)
 H1 : μ ≠ 26 (with extra handling, the mean at
2 years will be different from 26 pounds)
 Example we use α = .05 two tail
© aSup
61
 
STEP-2: Set the Criteria for a Decision
 Sample means that are likely to be obtained if
H0 is true; that is, sample means that are close
to the null hypothesis
 Sample means that are very unlikely to be
obtained if H0 is false; that is, sample means
that are very different from the null hypothesis
 The alpha level or the significant level is a
probability value that is used to define the
very unlikely sample outcomes if the null
hypothesis is true
© aSup
62
 
The location of the critical region
boundaries for three different los
-1.96
-2.58
-3.30
© aSup
α = .05
α = .01
α = .001
1.96
2.58
3.30
63
 
STEP-3: Collect Data and Compute
Sample Statistics
 After obtain the sample data, summarize the
appropriate statistic
σM =
σ
√n
M-μ
z= σ
M
© aSup
NOTICE
 That the top of the z-scores
formula measures how much
difference there is between the
data and the hypothesis
 The bottom of the formula
measures standard distances that
ought to exist between the sample
mean and the population mean
64
 
STEP-4: Make a Decision
 Whenever the sample data fall in the critical
region then reject the null hypothesis
 It’s indicate there is a big discrepancy between
the sample and the null hypothesis (the
sample is in the extreme tail of the
distribution)
© aSup
65
 
LEARNING CHECK
HYPOTHESIS TEST WITH z
 A standardized test that are normally
distributed with μ = 65 and σ = 15. The
researcher suspect that special training in
reading skills will produce a change in scores
for individuals in the population. A sample of
n = 25 individual is selected, the average for
this sample is M = 70.
 Is there evidence that the training has an effect
on test score?
© aSup
66
 
FACTORS THAT INFLUENCE A
HYPOTHESIS TEST
M-μ
z= σ
M
σM =
© aSup
σ
√n
 The size of difference
between the sample mean
and the original population
mean
 The variability of the scores,
which is measured by either
the standard deviation or the
variance
 The number of score in the
sample
67
 
DIRECTIONAL (ONE-TAILED)
HYPOTHESIS TESTS
 Usually a researcher begin an experiment with
a specific prediction about the direction of the
treatment effect
 For example, a special training program is
expected to increase student performance
 In this situation, it possible to state the
statistical hypothesis in a manner that
incorporates the directional prediction into the
statement of H0 and H1
© aSup
68
 
LEARNING CHECK
A psychologist has developed a standardized
test for measuring the vocabulary skills of 4year-old children. The score on the test form a
normal distribution with μ = 60 and σ = 10.
A researcher would like to use this test to
investigate the hypothesis that children who
grow up as an only child develop vocabulary
skills at a different rate than children in large
family. A sample of n = 25 only children is
obtained, and the mean test score for this sample
is M = 63.
© aSup
69