Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linear dependence (correlation) Suppose we observe n pairs of values {(x1 , y1 ), (x2 , y2 ),K (xn , yn )} The covariance of x and y is the value σ xy 1 n = ∑ ( xi − x )( yi − y ) n i =1 The correlation of x e y is the value σ xy ρ xy = σ xσ y where σx and σy are the standard deviations of x and y. Correlation Takes values in [-1, 1] Values close to 1 (-1) indicate an almost perfect linear relation between x and y with positive (negative) slope. Values close to 0 indicate absence of a linear relation between the two variables Warning Zero correlation does not exclude any form of relation between x e y, but only absence of linear relation. Corr(SP100, NASDAQ) = 0.87 Percentage daily returns from 2.1.1998 to 3.6.2011 Quantiles / percentiles The k-th percentile is the modality that separates the smallest k% from the greatest (100 - k)%. Formally, q is the k-th percentile if # {xi < q} ≤k n # {xi > q} and ≤ (1 − k ) n The median is the quantile 0.5, or the 50-th percentile. The quartiles are the quantiles that divide the ordered data in 4 equal parts: percentiles 25, 50, 75 The deciles are the quantiles that divide the ordered data in10 equal parts: percentiles 10, 20, …,90 Example with numeric data Find the quartiles in {800, 223, 417, 114, 413, 299, 415, 830, 796} Osservazioni Perc < xi Perc > xi 114 0 .889 223 .111 .778 299 .222 .667 413 .333 .556 415 .444 .444 417 .556 .333 796 .667 .222 800 .778 .111 830 .889 0 Elements of probability Probability space Ω sample space, set of elementary events. For ex. In a dice there are 6 elementary elements. F sigma-filed, subsets of Ω (events) that we are interested to associate with a probability. P probability measure, value in [0,1] to be associated to each event in F. Sample space Any non empty set Examples Coin toss: H, C Dice toss: the 6 facets Throwing a chock against a blackboard: any two dimensional point on the blackboard Sigma-field (collection of events) Set F of subsets of the sample space Ω such that: The set Ω belongs to F. If A belongs to F then also his complement with respect to Ω belongs to F. If A1, A2, … belong to F then also their union {A1∪A2∪…} belongs to F. Consequences: empty set ∅ and intersections of events are in F. Try to build example using the sample spaces of the previous slide. Probability measure Set function that maps any element of F to a number in [0,1] such that: P(Ω) = 1 If A1, A2,… are disjoint sets (empty intersection), then P(A1∪A2∪…) = P(A1) + P(A2) + … There are different approaches to the construction of probabilities: Classic Frequentist Bayesian Approaches to probabilities Classic. Tied to gambling. The probability of each elementary event is equal 1/(number of elementary elements) For ex. each facet of the dice has probability 1/6 Frequentist. The probability of an event is the relative frequency of that event as the number of tries goes to infinity. For ex. I throw a dice n times and check the relative frequency of each facet as n diverges. Bayesian. Subjectivist: it is a probability based on my own opinion grounded on past experience. Probability of «A or B» We’ve seen that if A and B are disjoint the probability P(A or B) = P(A∪B) = P(A) + P(B) If A and B intersect the probability of the intersection is to be subtracted from the total (otherwise counted twice) P(A or B) = P(A∪B) = P(A) + P(B) – P(A e B) Probability of «A or B» and independence Two events are dependent if there is some connection btween the two. In this case knowing that A happened tells us something about the happening of B and viceversa. For ex. if I know that throwing a dice an even number was extracted (A), this tell me something about the extraction of number 3 (B). The probability that B occurs knowing that A happened, is termed conditional probability and is indicated with P(B | A) P(A e B) = P(A) P(B|A) = P(B) P(A|B) Independence Two events A and B are said independent when knowing that A happened does not give any information on the occurrence of B, and vice-versa. For ex. knowing that in throwing a dice “an even number was extracted” (A) does not provide any information of the event “the extracted number is in the interval 1-4” (B). In this case P(B | A) = P(B) and P(A and B) = P(A) P(B) Exercises Compute the probability that in one dice toss one gets {1 or 2 or 3}. A: 1/6 + 1/6 + 1/6 = 1/2 Compute the probability that in one dice toss one gets {1 or 2 or 3} knowing that an even number is extracted. A: P(1 or 2 or 3 | 2 or 4 or 6) = P(2)/P(2 or 4 or 6) = (1/6) / (1/2) = 1/3 Compute the probability that tossing two dices the sum of the results is 3. A: P({D1=1} and {D2=2}) + P({D1=2} and {D2=1}) = P{D1=1} P{D2=2} + P{D1=2} P{D2=1} = 2/36 Random variables (r.v.) Given a probability space (Ω, F, P), a (misurable) function that associates numbers to events in Ω is called random variable. If we call ω the generic element of Ω, we can indicate a r.v. with X(ω), but for brevity usually only X is used. For ex. for the dice toss: ω ⚀ ⚁ ⚂ ⚃ ⚄ ⚅ X(ω ω) 1 2 3 4 5 6 Cumulative probability function It is a non-decreasing function that returns the probability of a r.v. X being smaller or equal to a given value x 0.8 1.0 Funzione di ripartizione per il lancio del dado 0.6 FX (x ) = Pr( X ≤ x) 0.0 0.2 0.4 F(x) 0 1 2 3 4 x 5 6 7 Discrete vs. continuous r.v. When X ranges only in a finite or numerable set of values, X is discrete. When X ranges in a set with the power of continuous (real numbers or intervals), then X is continuous. There are also «intermediate» situations in which X can be seen as the sum of a continuous and a discrete r.v. We will not see any example of this kind in this course. Probability function For a discrete r.v. it is possible to define the function p(x) that map each outcome to its probability. E.g. for the dice toss we have: 1 6 per x = 1,2,3,4,5,6 p(x ) = altrementi 0 Off course F ( x) = ∑ p (i ) i≤ x Bernoulli r.v. for x = 0,1 otherwise 1.0 Funzione di probabilità di Bernoulli (p=0.7) 0.8 p x (1 − p )1− x p(x ) = 0 0.6 with p in [0,1] 0.4 0.2 0.0 Takes only the two values {0,1} The probability function is p(x) 0.0 0.2 0.4 0.6 x 0.8 1.0 Binomial r.v. n! n− x p x (1 − p ) p ( x ) = x!(n − x )! 0 for x = 0,1,..., n otherwise 0.20 Funzione di probabilità binomiale (p=0.7, n=10) Binomial probability function with p = .7, n=10 0.10 0.00 Takes the values {1,2,…n} It is the sum of n independent Bernoulli r.v. It has two parameters p e n p(x) 0 2 4 6 x 8 10 Poisson r.v. Takes the values {1,2,…} Used for modeling the probability of occurrence of a number of events in a given time span (e.g. number of calls in a call-center) λx exp(− λ ) per x = 0,1,... ( ) p x = Only one parameter λ > 0 x! altrimenti 0.10 0.05 0.00 p(x) 0.15 0.20 di probabilità di Poisson (lambda=3.5) PoissonFunzione probability function with λ = 3.6 0 0 5 10 x 15 Expectaion The mean or expectation of a discrete r.v. is E( X ) = ∑ x ⋅ p(x ) x∈Χ where Χ represents the range of X, that is the support of X. As a consequence the variance is given by Var ( X ) = ∑ [x − E( X )] ⋅ p( x ) 2 x∈Χ Exercises Compute mean and variance of Bernoulli, Binomial and Poisson r.v. Solutions: V.C. E(X) Var(X) Bernoulli p p(1 – p) Binomiale np np(1 – p) Poisson λ λ Probability density function The probability that a continuous random variable hits a given number is zero and so no probability function can be defined. For continuous r.v. one uses the probability density function f (x): the area underneath the curve over an interval gives the probability that the r.v. assumes a value in that interval. The relation with the cumulative probability function is F (x ) = x ∫ −∞ f (z ) d z d F (x ) f (x ) = dx Example The probability of the interval [5, 10] is 0.25 Mean and variance Mean E( X ) = ∞ ∫ x ⋅ f (x ) d x −∞ Variance var( X ) = ∞ 2 ( ) [ x − E X ] ⋅ f (x ) d x ∫ −∞ One notation for the expectation Note that one can compute the expectation of any function of the g(X) of a r.v. (if it exists) and the notation can be made uniform as E[g ( X )] = ∞ ∫ g (x ) d F (x ) −∞ This is a Riemann-Stieltjes integral, but for those who do not know this definition, they can can just read it as: ∞ continuous r.v. g (x ) ⋅ f (x ) d x ∫ −∞ Discrete r.v. ∑ g (x ) ⋅ p(x ) x∈Χ Normal or Gaussian r.v. Density function: it has two paramaters µ e σ. 2 1 1 x−µ f (x ) = exp 2π σ 2 σ Mean = µ Variance = σ 2 In the standard normal µ = 0, σ = 1 Interval (-1.64, 1.64) contains 90% of prob. Interval (-1.96, 1.96) contains 95% of prob. Interval (-2.58, 2.58) contains 99% of prob. Note: standardization Let X be a r.v. with mean µ and standard deviation σ. Then X −µ Y= σ is a r.v. with mean 0 and standard deviation 1. If X is normal, then Y is standard normal. Chi Square r.v. It has one param. κ > 0 named degrees of freedom (dof). Takes only positive values. Summing the square of k std. normal independent r.v. gives a Chi Square r.v. with κ = k dof Mean = κ Variance = 2κ Student’s t r.v. It has one param. κ > 0 named degrees of freedom (dof). Takes values in ℝ (real numbers). If Z is standard normal and S2 is Chi Square with κ dof independent of Z, then Z S2 κ is Student’s t with κ dof. For κ →∞ Student’s t converges to a std. normal Mean = 0 Variance = κ / (κ - 2) if κ > 2, infinite otherwise. Student’s t density Moments The p-th moment of a r.v. is defined as ∞ µ p = E (X p ) = ∫ x p d F ( x ) −∞ The p-th central moment if a r.v. is defined as [ ] ∫ (x − µ ) m p = E ( X − µ1 ) = p ∞ 1 p d F (x ) −∞ Obviously: µ1 is the mean and e m2 the variance. Symmetry A r.v. X is symmetric when for all x Pr{X − med < x} = 1 − Pr{X − med > x} where med stands for the median. In other words we say that a random variable is symmetric when its density reflects with respect to the vertical axis centered on the median. Odd central moments are zero in symmetric distributions. Skewness It is the standardized third moment X − µ 3 m3 γ 1 = E = 3 σ σ Kurtosis or «tail thickness» The fourth central std. moment is a measure of «tail thickness»: X − µ 4 m4 kurt = E = 4 σ σ In normal distr. the kurtosis is 3, and the kurtosis index (or excess kurtosis): γ2 = m4 σ 4 −3 Same mean, same variance 0.7 0.6 0.5 Leptokurtic (kurt > 3) 0.4 Mesokurtic (kurt = 3) 0.3 Platykurtic (kurt < 3) 0.2 0.1 -4 -2 2 4 Same picture in log scale 10-5 10-13 10-21 10-29 10-37 -5 0 5 10 SP100 returns vs. normal density Densità rendimenti SP100 vs. normale (media = 0.0005, st.dev. = 1.34) 0.2 0.1 0.0 Density 0.3 0.4 Kurt = 10.28 -10 -5 0 5 10