Download stat

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistics
CSCI 115
1/21/2002
1
The Fields of
Probability and Statistics
Probability
Statistics
Descriptive statistics
Inferential statistics
7
Probability and Statistics
Probability: We determine the chances
of selecting a certain sample from a
known population
 Statistics: We make estimates or
projections about a whole population
based on a sample.

8
Example of Probability

Suppose that there are 13 women and 7
men taking English 101 section 3. A
student is picked at random to give a
presentation. The probability that the
student is a woman is 13/20 and the
probability that the student is a man is
7/20.
9
Example of Statistics

While walking across campus we
observe 13 women and 7 men. We
conclude that about 65% (13/20) of all
students at PLU are female and 35%
(7/20) are males.
10
Two Branches of Statistics
Descriptive statistics
 Inferential statistics

11
Descriptive Statistics

The methods or techniques designed to
summarize or to describe the main
features of numerical data
12
Inferential Statistics

Involves those methods and techniques
whereby estimates of a general nature
are made on the basis of knowledge
about a part or sample of the general
population
13
Example of Descriptive
Statistics

We determine the age of all students in
this class and determine that
the average age is 20.8 years,
the minimum is 18
and
the maximum is 42
14
Example of Inferential
Statistics

The Mooring Mast calls 100 PLU
students at random and asks them if
they are satisfied with the current
student government? 48 say yes, 30
say no and 22 are not sure. They
conclude that 48% of all students are
satisfied, 30% are not and 22% are not
sure.
15
Example of Descriptive
Statistics

The food service interviews every
student with a meal plan and determines
that 39% are very satisfied with their
service.
16
Example of Inferential
Statistics

The food service picks 50 students with
a meal plan at random and interviews
them. They determine that 34% of the
students in the sample were completely
satisfied. They infer that 34% of all the
students on a meal plan are completely
satisfied.
17
Other Examples of Descriptive
Statistics
An instructor determines that the
average exam score was 82 and that
there were 5 A’s, 6 B’s, 10 C’s, 2 D’s and
1 E.
 After an election, it is determined that
the winner received 54% of the vote.
 To test a new drug, it is given to 200
patients. It is found that the new drug
helps 82% of those patients.

18
Other Examples of Inferential
Statistics
Before giving a new SAT test, the writers
give the exam to 2000 students to help
standardize the exam.
 Before an election, a pollster conducts a
survey and predicts that a certain candidate
will win the election with 54% of the vote.
 Based on experiment where a new drug
helped 80% of the patients in a sample of
200, it is decided the drug may be useful.

19
Some Descriptive Statistics
Count: Number of values
 Mean: The sum of the values divided by
the number of values
 Mode: The value(s) that occurs most
frequently
 Median: The middle value. Half the
values are larger, half are smaller. (If
the number of values is even, it is the
average of the two middle values.)

20
Example 1:
10 point quiz scores: 8, 3, 8, 9, 6
 Count: 5
 Mean: (8 + 3 + 8 + 9 + 6)/5 = 34/5 = 6.8
 Mode: 8 occurs twice, other values only
once so the mode is 8
 Median: order the values: 3, 6, 8, 8, 9
so median is 8

21
Example 2:
10 point quiz scores: 8, 3, 9, 6
 Count: 4
 Mean: (8 + 3 + 9 + 6)/4 = 26/4 = 6.5
 Mode: No value occurs more than once.
No mode
 Median: order the values 3, 6, 8, 9
Median = (6 + 8)/2 = 7

22
Three ways to determine the
average
Mean
 Mode
 Median

23
Why all the different ways of
calculating “Average”?
Example: The accessed values In a
certain neighborhood are as follows:
No.
Value
No. Value
1 $2,000,000
2 $200,000
3
$150,000
4 $100,000
 Count: 10
 Mean: $3,250,000/10 = $325,000
 Mode: $100,000
 Median: $150,000

24
Why all the different ways of
calculating “Average”? (con’t)
Question: In determining the amount of
tax collected from the neighborhood,
which average is most meaningful to the
tax collector?
 In trying to determine what a home
buyer is likely to pay, which is most
important?
 What is the cost of the most common
home in the area?

25
Calculation of weighted
averages
Use house accessed value example
 Count
Value
Product
1
$2,000,000
$2,000,000
2
200,000
400,000
3
150,000
450,000
4
100,000
400,000
10
$3,250,000
 Average is $3,250,000/10 = $325,000

26
Some Other Useful Statistics
Max: Largest value
 Min: Smallest value
 Range: Max - Min

27
Ordering and arranging data

Sometimes ordering data in ranges is
Data values
useful
68
66
95
78
50-59
60-69
70-79
89-89
90-100
81
89
79
72
74
85
76
84
Data ordered and arranged in ranges of 10
54
61
63
66
68
69
72
72
74
76
78
81
81
84
85
89
95
69
54
81
72
69
79
79
69
63
61
79
28
Histograms

Column charts showing the number of
times a value or range of values
appears
29
Histogram
Data values
81
74
89
85
79
76
72
84
68
66
95
78
69
54
81
72
79
69
63
61
Frequency Table
Range
Count
50-59
1
60-69
6
70-79
7
80-89
5
90-100
1
Number in range
Histogram of test scores
8
7
6
5
4
3
2
1
0
50-59
60-69
70-79
80-89
90-100
Score
30
Histogram with 5 point range
Data values
81
74
89
85
79
76
72
84
68
66
95
78
79
69
63
61
Frequencies in ranges of 5
95-100
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
5
4
3
2
1
0
50-54
Count
Frequency Table
Range
Count
50-54
1
55-59
0
60-64
2
65-69
4
70-74
3
75-79
4
80-84
3
85-89
2
90-94
0
95-100
1
69
54
81
72
Range
31
Excel and Histograms
Use the frequency function to help build
frequency tables
 Use column charts to create histograms
from frequency tables
 Use Data | Sort or sort tools buttons to
sort data

32
Comparing Two Sets of
Numbers
Set 1: 5 7 3 5 6 6 5 5 5 3 5 4 5 5 5 6 5 5 4
Mean 5.0588
Median 5 Mode 5
Max 7
Min
3 Range 4
 Set 2: 3 5 3 7 7 4 6 4 5 3 6 5 5 3 4 5 6 7 6
Mean 5.0588
Median 5 Mode 5
Max 7
Min
3 Range 4
 Are these sets essentially the same?

33
Frequencies

Lets arrange the values in order
Set 1
Set 2
33
3333
44
444
55555555555
55555
666
6666
7
777
34
Frequency Table
Frequency Count
Value Set 1
Set 2
3
2
4
4
2
3
5
11
5
6
3
4
7
1
3
 Set 1 seems bell shaped, centered about 5
Set 2 seems to be dispersed about equally

35
Histogram
H i sto g r a m o f 2 d a ta se ts
12
10
Count
8
6
4
2
0
3
Set 1
Set 2
4
5
6
7
V a lu e s
36
Histogram

The groups used in histograms may
include a single value or several values.
Sometimes grouping several values in
ranges, may help hide “noise”
37
Standard Deviation
A way to measure how close the
numbers are to each other
 The standard deviation is the square
root of the mean of the square of the
deviation of each number from the mean
of the list
 This definition assumes we calculate the
standard deviation of the entire
population

38
Standard Deviation

If the values are x1, x2, x3, ..., xn, and
the mean is m then
( x  m) 2  ( x  m) 2  ( x  m) 2 ...( x  m) 2
n
1
2
3
n
n
2
(
x

m
)
 i
 i 1
n
39
Variance


The variance is the mean of the squared
deviations
The standard deviation is the square root of
the variance
40
Calculating Standard
Deviation - Step 1




Example: Calculate the standard deviation of
4, 8, 3, 6, 9
Values
4
8
3
6
9
sum
30
mean
6
41
Calculating Standard
Deviation - Step 2


Example: Calculate the standard deviation of
4, 8, 3, 6, 9
Values Deviations
4
-2
8
2
3
-3
6
0
9
3
sum
30
mean
6
42
Calculating Standard
Deviation - Step 3


Example: Calculate the standard deviation of
4, 8, 3, 6, 9
Values Deviations Squared Deviations
4
-2
4
8
2
4
3
-3
9
6
0
0
9
3
9
sum
30
sum
26
mean 5.2 variance
mean
6
2.28 st. dev.
43
Another example


Example: Calculate the standard deviation of
2, 4, 6
Values Deviations Squared Deviations
2
-2
4
4
0
0
6
2
4
sum
12
sum
8
mean
4
mean
2.667 variance
1.633 st. dev
44
Standard Deviation of a Sample



The initial formulas assumed we calculated
the standard deviation of the entire population
If we calculate the standard deviation on only
a sample, we divide by n-1 instead of n
Dividing by n-1 allows are calculation to be
“unbiased”
45
Standard Deviation of a Sample
2
2
2
2
( x  m)  ( x  m)  ( x  m) ...( x  m)
n
1
2
3
n 1
n
2
 ( xi  m)
 i 1
n 1
46
Standard Deviation of a Sample


Example: Calculate the standard deviation of the
sample 4, 8, 3, 6, 9 randomly picked from 2000
Values Deviations Squared Deviations
4
-2
4
8
2
4
3
-3
9
6
0
0
9
3
9
sum
30
sum
26
mean
6
“mean” 6.5 variance
2.55 st. dev.
47
Standard Deviation and Excel


Use STDEVP and VARP if the values
represent the entire population
Use STDEV and VAR if the values are for a
sample from the entire population
48
49