Download Basic statistics

Basic statistics Usman Roshan Basic probability and stats • • • • • • • • Random variable Probability of an event Coin toss example Independent random variables Mean and variance of a random variable Correlation between random variables Probability distributions Central limit theorem Random variable • A variable normally takes on different values • Random variable has values with different probabilities • Coin toss example • Dice example • Probabilities must sum to 1 Probability of event • Sample space: set of total possible outcomes • Event space: set of outcomes of interest • Probability of an event is – (size of event space)/(size of sample space) • Counting: how many ways to pick k unique items from a set of n items? • Probability and counting • Bernoulli trials • Coin tossing example • R function: rbinom Basic stats • Independent events: coin toss example • Expected value of a random variable – – example of Bernoulli and Binomal • Variance of a random variable • Correlation coefficient (same as Pearson correlation coefficient) • Formulas: – Covariance(X,Y) = E((X-μX)(Y-μY)) – Correlation(X,Y)= Covariance(X,Y)/σXσY – Pearson correlation Probability distributions • Binomial distribution – sum of Bernoulli trials – converges to Gaussian as number of trials approaches infinity • R functions – Reference card – rbinom – Repeated executions of rbinom • Lists • for loops • sum, length • Gaussian distribution • Chi-square distribution Limit theorems • Law of large numbers: empirical mean converges to true mean as we do more trials (follows from Chebyshev’s and Markov’s inequalities) Law of large numbers E(X) P(X ³ a) £ a • Markov’s inequality Var(X) • Chebyshev’s inequality P( X - E(X) ³ a) £ 2 a • Law of large numbers: sample mean of n i.i.d. random variables Xi converges to true one in probability • Can be proved by applying Chebyshev’s inequality X Var(X) s å X= , P( X - E(X) ³ e ) £ = i n e2 2 ne 2 Limit theorems • Central limit theorem: average of sampling distribution converges to a normal distribution as we do more trials. Specifically, it is normally distributed with mean equal to the true mean μ and standard deviation equal to σ/sqrt(n) where n is number of trials and σ is true standard deviation • How is this useful? Consider modeling the mean height of NJ residents. Can we assume it is normally distributed due to Central Limit Theorem?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Basic statistics