Download Introduction - Studentportalen

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability wikipedia , lookup

Transcript
Introduction
Probability Theory
●
Probability theory is an enhancement for our lack of knowledge
●
It deals with non-deterministic events
●
The theory originated in the problems of gambling
●
It is used in many fields
–
Medicine
–
Economy
–
Computer Science
–
Linguistics
–
Psychology
–
Management
Statistics
●
Statistics concerns with problems in data
–
Collection
–
Analysis
–
Interpretation
–
Presentation
–
Organization
●
A simple example:
–
We have two boxes: one red and one blue
–
The red box contains 2 apples and 6 oranges
–
The blue box contains 3 apples and 1 orange
–
We randomly pick one of the boxes and then we
randomly select a fruit from the box with
replacement
–
We assume that the red box is picked in 40% of
time, and the blue one is picked is 60% of time
●
We want to answer the questions such as:
–
What is the overall probability that the selection
procedure will pick an apple?
–
Given that we have chosen an orange, what the
probability that the box we chose was the blue one?
●
How do we solve these problems?
–
We choose a random variable corresponding to the identity of a box
–
This random variable (B) takes one of two values r (for red) and b
(for blue)
–
Another random variable is chosen for the identity of fruit
–
This random variable (F) takes the values a (for apple) or o (for
orange)
–
We formulate our problems in interest through the values of random
variables
–
We define a real-valued function that gives a probability value in [0,1]
to the values of random variables
Probability and statistics in
computational linguistics
●
●
●
●
Language is a complex system
We need a theory to model the uncertainty
involved in language structures
The uncertainty is rooted in our lack of
knowledge about language, its nature, its
development, and its variation
We have access to a large amount of data
Machine Learning
●
●
●
●
How to fit a numerical model on a set of data
It uses the probability and statistic theory in order to give a
machine the ability to learn from data
We have many unlabeled data and some labeled data in our field
Machine learning allows for using these data to solve many
problems such as
–
Question answering
–
Machine translation
–
Text summarization
–
Document classification
–
...
Combinatorail Analysis
Sample Space
●
●
The set S consisting of all possible outcomes
of an experiment is called the sample space
Examples:
–
Tossing a coin
S={H , T }
–
Tossing two coins
S={HH , HT , TH , TT }
–
Tossing a coin until an H is obtained
–
–
S={H , TH , TTH , TTTH , ...}
Choosing an English letter
S={a , A , b , B , c , C , ...}
Statement
●
●
●
●
A declarative sentence that can be true for some of the outcomes or false
for other outcomes of a sample state
Example:
–
A toss of a given coin results in head
–
A letter selected at random is 'z'
–
A word selected at random starts with 'z'
We are interested in statements whose truth or falsehood for each
possible outcome is deterministic
A statement is a function from sample state S to {true,false}
f : S→{true , false }
Experiment
●
●
A statement or several statements for which
some outcomes in a sample space is true and
other outcomes in the sample space are false
Examples:
–
A toss of a given coin results in head (H)
–
The letter appeared in the beginning of a word in 'z'
Event
●
The set of sample points for which a statement
is true
●
Every subset of a sample state is an event
●
Example:
–
The event corresponding to “a head is seen when
tossing two coins”
{HH , HT , TH }
–
The event corresponding to “the word starts with 'z'”
{zone , zero , zambia , ...}
Compound Statements
●
●
Given two statements p and q, the statements in the
following forms are compound statements
–
p and q
–
p or q
–
not p
Example:
–
A word selected at random starts with 'z' and ends with 'o'
–
A word selected at random is VERB or NOUN
–
A letter selected at random is not vowel
Compound Statements and Events
●
●
If P and Q are the events corresponding to statements p and q
then
–
the event correspond to p and q is
–
the event correspond to p or q is
–
event correspond to not p is
P∩Q
P∪Q
¬P
Example:
–
–
–
–
–
P={HH , HT , TH }
q: a tail is seen when tossing two coins
Q={HT , TH , TT }
p and q: a head and a tail is seen ...
P∩Q={HT , TH }
p or q: a head or a tail is seen ... P∪Q={HH , HT , TH , TT }
not p: no head is seen ....
¬P={TT }
p: a head is seen when tossing two coins
Counting
●
●
Number of outcomes of an event P is equal to
the number of elements of P, n( P)
Principles of counting
–
Addition principle
–
Multiplication principle
–
Permutation
–
Combination
Addition Principle
●
●
●
For any two sets A and B
–
n( A∪B)=n( A)+n(B)−n( A∩B)
A and B are disjoint sets if A∩B=∅
If A and B are two disjoint sets then
n( A∪B)=n( A)+n(B)
Multiplication Principle
●
If an experiment is performed in m steps and
each step results in n_i i=1,2,3,... outcomes
then the total number of possible outcomes for
all experiments together is
m
∏ ni =n1×n2×n3 …nm
i=1
Multiplication Principle
●
Examples
–
If we toss a coin three times then the total number of
possible outcomes is 2*2*2 = 8
–
If we randomly choose three letters from English alphabet
with replacement then the total number of possible
outcomes is 26*26*26 = 17576
–
If we randomly choose three letters from English alphabet
without replacement then the total number of possible
outcomes is 26*25*24 = 15600
–
The total number of subsets of a set of m elements is 2^m
Permutation
●
●
●
Any arrangement of objects in a list is a
permutation
NOTE: the order of objects in permutation is
important
Example:
–
How many different arrangement of letters a,b, and c
are possible
●
●
{abc, acb, bac, bca, cba, cab}
3*2*1 = 6
Permutation
●
The total number of permutation of a list
consisting of n different objects is
n!=n (n −1) (n−2)⋯(1)
0!=1
●
The total number of permutation of r different
objects out of n different objects is
P (n , r)=n (n−1) (n−2)⋯(n−r +1)
n!
P (n , r )=
(n−r)!
P (n , n)=n!
Permutation
●
Example
–
In how many ways we can sort a deck of 52 cards
52!=52 (52−1) (52−2)⋯(1)=8.0658e+67
–
In how many ways we can select an ordered list of
three letters out of 6 letters a, b, c, d, e, and f.
6!
P (6,4)=
=360
(6 −4)!