Download CS1512

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
CS1512
Foundations of
Computing Science 2
(Theoretical part)
Kees van Deemter
Probability and statistics
Propositional Logic
Elementary set theory
1
© J R W Hunter, 2006; C J van Deemter 2007
What this is going to be about
1. Suppose you know that the statement p is true
and that the statement q is true.
What can you say about the statement that p and q ?
2
What this is going to be about
1. Suppose you know that the statement p is true
and that the statement q is true.
What can you say about the statement that p and q ?
In this case, you know that p and q is also true.
2. Suppose you know that the statement p has a probability of .5
and the statement q has a probability of .5.
What can you say about the statement that p and q ?
3
What this is going to be about
1. Suppose you know that the statement p is true
and the statement q is true.
What can you say about the statement that p and q ?
In this case, p and q is also true.
2. Suppose you know that the statement p has a probability of .5
and the statement q has a probability of .5.
What can you say about the statement that p and q ?
It depends! If p and q are independent of each other then you know
that p and q has a probability of .25 But suppose (1)
p = It will snow (some time) tomorrow and
q = It will be below zero (some time) tomorrow
Then p and q has a probability >.25
4
What this is going to be about
1. Suppose you know that the statement p is true
and the statement q is true.
What can you say about the statement that p and q ?
In this case, p and q is also true.
2. Suppose you know that the statement p has a probability of .5
and the statement q has a probability of .5.
What can you say about the statement that p and q ?
It depends! If p and q are independent of each other then you know
that p and q has a probability of .25 But suppose (2)
p = It will snow (sometime) tomorrow and
q = It will not snow (any time) tomorrow
Then p and q has a probability 0
5
Before we get there ...
Some basic concepts in statistics
• different kinds of data
• ways of representing data
• ways of summarising data
This will be useful later in your CS career. For example to
• assess whether a computer simulation is accurate
• assess whether one user interface is more user friendly than another
• estimate the expected run time of a computer program (on typical
data)
6
Lecture slides on statistics and probability
are based on originals by Professor Jim Hunter.
7
CS1512
Foundations of
Computing Science 2
Lecture 1
Probability and statistics (1)
8
© J R W Hunter, 2006; C J van Deemter 2007
Sources
Text book (parts of chapters 1-6):
Essential Statistics (Fourth Edition)
D.G.Rees
Chapman and Hall
2001
(Blackwells, ~£28)
Courses:
ST1505
Mathematical Sciences
http://maths.abdn.ac.uk/~ap/st1505/
CS1012
Sets
9
Some definitions
Sample space (population)
• Set of entities of interest, also called elements
• this set may be infinite
• entities can be physical objects, events, etc. ...
Sample
• subset of the sample space
10
More definitions
Variable
• an attribute of an element which has a value (e.g., its
height, weight, etc.)
Observation
• the value of a variable as recorded for a particular
element
• an element will have variables with values but they are
not observations until we record it
Sample data
• set of observations derived from a sample
11
Descriptive and Inferential
Statistics
Descriptive statistics:
• Summarising the sample data (as a number, graphic ...)
Inferential statistics:
• Using data from a sample to infer properties of the sample space
• Chose a ‘representative sample’
(properties of sample match those of sample space – difficult)
• In practice, use a ‘random sample’
(each element has the same likelihood of being chosen)
12
Variable types
Qualitative:
• Nominal/Categorical (no ordering in values)
 e.g. sex, occupation
• Ordinal (ranked)
 e.g. class of degree (1, 2.1, 2.2,...)
Quantitative:
• Discrete (countable) – [integer]
 e.g. number of people in a room
• Continuous – [double]
 e.g. height
13
Examples
1.
2.
3.
4.
A person’s marital status
The length of a CD
The size of a litter of piglets
The temperature in degrees centigrade
14
Examples
1. A person’s marital status
Nominal/categorical
2. The length of a CD
Quantitative; continuous or discrete?
This depends on how you model length (minutes or bits)
3. The size of a litter of piglets
Quantitative, discrete (if we mean the number of pigs)
4. The temperature in degrees centigrade
Quantitative, continuous
(Even though it does not make sense to say that 200
is twice as warm as 100)
Footnote: We us the term `Continuous` a bit loosely: For us a variable is
continuous/dense (as opposed to discrete) if between any values x
and y, there lies a third value z.
15
Summarising data
Categorical (one variable):
• X is a categorical variable with values: a1, a2, a3, ... ak, ... aK
(k = 1, 2, 3, ... K)
• fk = number of times that ak appears in the sample
fk is the frequency of ak
• if we have n observations then:
relative frequency = frequency / n
• percentage relative frequency = relative frequency  100
16
Frequency
sample of 572 patients (n = 572)
Blood Type
A
AB
B
O
Totals
Frequency
210
35
93
234
572
Relative Frequency
0.37
0.06
0.16
0.41
1.00
Percentage RF
37%
6%
16%
41%
100%
sum of frequencies = n
17
Bar Chart
18
Summarising data
Categorical (two variables):
• contingency table
• number of patients with blood type A who are female is 108
Blood Type
Sex
Totals
male
female
A
AB
B
O
102
12
46
120
108
23
47
114
210
35
93
234
Totals
280
292
572
19
Summarising data
Categorical (two variables):
• contingency table
• number of patients with blood type A who are female is 108
Blood Type
Sex
Totals
male
female
A
AB
B
O
102
12
46
120
108
23
47
114
210
35
93
234
Totals
280
292
572
% Blood Type by sex
male
female
49%
34%
50%
51%
51%
66%
50%
49%
20
Bar Chart
21
Ordinal data
• X is an ordinal variable with values: a1, a2, a3, ... ak, ... aK
• ‘ordinal’ means that:
a1 ≤ a2 ≤ a3 ≤ ... ≤ ak ≤ ... ≤ aK
• cumulative frequency at level k:
ck = sum of frequencies of values less than or equal to ak
ck = f1 + f2 + f3 + ... + fk
= (f1 + f2 + f3 + ... + fk-1 ) + fk
= ck-1 + fk
• Can be applied to quantitative data as well ...
22
Cumulative frequencies
Number of piglets
in a litter: (discrete data)
c1=f1=1,
c2=f1+f2=1,
c3=f1+f2+f3=3,
c4=f1+f2+f3+f4=6,
etc.
cK = n
Litter size
Frequency=f
5
6
7
8
9
10
11
12
13
14
Total
1
0
2
3
3
9
8
5
3
2
Cum. Freq =c
1
1
3
6
9
18
26
31
34
36
36
23
Plotting
frequency
cumulative frequency
24
Continuous data
• A way to obtain discrete numbers from continuous data:
Divide range of observations into non-overlapping intervals (bins)
• Count number of observations in each bin
• Enzyme concentration data in 30 observations:
121
95
85
119
62
Range: 25 to 201
25
81
145
57
104
83
123
100
64
139
110
67
70
151
201
60
113
93
48
68
101
78
118
92
95
For example, you can use 10 bins of width 20:
25
Enzyme concentrations
Concentration
19.5 ≤ c < 39.5
39.5 ≤ c < 59.5
59.5 ≤ c < 79.5
79.5 ≤ c < 99.5
99.5 ≤ c < 119.5
119.5 ≤ c < 139.5
139.5 ≤ c < 159.5
159.5 ≤ c < 179.5
179.5 ≤ c < 199.5
199.5 ≤ c < 219.5
Freq.
1
2
7
7
7
3
2
0
0
1
Totals
30
Rel.Freq.
0.033
0.067
0.233
0.233
0.233
0.100
0.067
0.000
0.000
0.033
% Cum. Rel. Freq.
3.3%
10.0%
33.3%
56.6%
79.9%
89.9%
96.6%
96.6%
96.6%
100.0%
1.000
26
Cumulative histogram
27