Download Probability density function

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DATA ANALYSIS I
Data: Probabilistic View
Sources
• Leskovec, J., Rajaraman, A., Ullman, J. D. (2014). Mining of
massive datasets. Cambridge University Press. [5-7]
• Zaki, M. J., Meira Jr, W. (2014). Data Mining and Analysis:
Fundamental Concepts and Algorithms. Cambridge
University Press. [13-25]
Assumption: The Data is Random
• Suppose you have a certain amount of data, and you look
for events of a certain type within that data.
• You can expect events of this type to occur, even if the data
is completely random, and the number of occurrences of
these events will grow as the size of the data grows.
• This essentially means that an algorithm or method you
think is useful for finding the events returns more false
positives.
Example
• Suppose there are believed to be some “evil-doers” out there, and we
want to detect them.
• We have reason to believe that periodically, evil-doers gather at a hotel
to plot their evil.
–
–
–
–
There are one billion people who might be evil-doers.
Everyone goes to a hotel one day in 100.
A hotel holds 100 people. There are 100,000 hotels.
We shall examine hotel records for 1000 days.
• There is 250,000 pairs of people who look like evil-doers, even though
they are not.
Bonferroni’s Principle
• Calculate the expected number of occurrences of the events
you are looking for, on the assumption that data is random.
• If this number is significantly larger than the number of real
instances you hope to find, then you must expect almost
anything you find to be bogus, i.e., a statistical artifact rather
than evidence of what you are looking for.
• Bonferroni’s principle says that we may only detect the
expected events by looking for events that are so rare that
they are unlikely to occur in random data.
Random Variable
• The probabilistic view of the data assumes that each numeric attribute X
is a random variable, defined as a function that assigns a real number to
each outcome of an experiment.
• Formally, X is a function X: O →R, where O, the domain of X, is the set
of all possible outcomes of the experiment, and R, the range of X, is the
set of real numbers.
• If the outcomes are numeric, and represent the observed values of the
random variable, then X: O→O is simply the identity function: X(v) = v
for all v ∈ O.
Discrete x Continuous
• Discrete random variable: A variable can take on only a
finite or countably infinite number of values in its range.
• Continuous random variable: A variable can take on any
value in its range.
Iris Dataset: Sepal Length
Random Variable: Sepal Length
• All n = 150 values of this attribute lie in the range [4.3,7.9],
with centimeters as the unit of measurement.
• Let us assume that these constitute the set of all possible
outcomes O.
• We can consider the attribute X1 to be a continuous random
variable, given as the identity function X1(v) = v.
Probability Mass Function
• Probability mass function: If X is discrete, the probability
mass function of X is defined as
• The function f gives the probability P(X = x) that the random
variable X has the exact value x.
Bernoulli distribution
• Short and long sepal lengths? We can define a discrete
random variable A as follows:
• A(v) =
– 0 if v < 7
– 1 if v ≥ 7
• Probabilities:
– f (1) = P(A = 1) = 13 / 150 = 0.087 = p
– f (0) = P(A = 0) = 137 / 150 = 0.913 = 1− p
Binomial Distribution
• Bernoulli trial is a random experiment with exactly two possible
outcomes, "success" and "failure", in which the probability of success is
the same every time the experiment is conducted.
• Let us consider another discrete random variable B, denoting the
number of Irises with long sepal length in m independent Bernoulli trials
with probability of success p.
Example
• The probability of observing exactly k = 2 Irises with long sepal length in
m = 10 trials is given as
Probability Density Function
• Probability density function: If X is continuous, the
probability density function, which specifies the probability
that the variable X takes on values in any interval [a,b]⊂ R is
defined as
• Probability mass is spread so thinly over the range of values
that it can be measured only over intervals [a,b] ⊂ R.
Probability Density Function
• The density function f must satisfy the basic laws of
probability:
Sepal Length: Distribution?
Normal Distribution
• A random variable X has a normal distribution, with the
parameters mean μ and variance σ2, if the probability
density function of X is given as follows:
Sepal Length [μ = 5.84, σ2 = 0.681]
Cumulative Distribution Function
• Cumulative distribution function: For any random variable
X, whether discrete or continuous, we can define the
cumulative distribution function F: R → [0,1], which gives the
probability of observing a value at most some given value x.
Discrete x Continuous
• When X is discrete
• When X is continuous
Cumulative Distribution Function
[Normal Distribution]
Bivariate Random Variables
• Instead of considering each attribute as a random variable, we can also
perform pair-wise analysis by considering a pair of attributes, X1 and X2,
as a bivariate random variable.
• Joint probability mass function (discrete random variables)
Bivariate Distributions
• Consider the sepal length and sepal width attributes in the Iris dataset.
Let A denote the random variable corresponding to long sepal length (at
least 7 cm) and the random variable B corresponding to long sepal
width (at least 3.5 cm).
• Let X = (A B) be a discrete bivariate random variable.
Probability Mass Function
• X1 (long sepal length), X2 (long sepal width)
Density Function
• Joint probability density function (continuous random
variables)
Bivariate Normal Density
Multivariate Random Variables
•
A d-dimensional multivariate random variable (vector random variable) X =
(X1,X2, . . . ,Xd )T is defined as a function that assigns a vector of real numbers
to each outcome in the sample space.
•
Joint probability mass function (discrete random variables)
•
Joint probability density function (continuous random variables)