Download Basic statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Basic statistics
Usman Roshan
Basic probability and stats
•
•
•
•
•
•
•
•
Random variable
Probability of an event
Coin toss example
Independent random variables
Mean and variance of a random variable
Correlation between random variables
Probability distributions
Central limit theorem
Random variable
• A variable normally takes on different values
• Random variable has values with different
probabilities
• Coin toss example
• Dice example
• Probabilities must sum to 1
Probability of event
• Sample space: set of total possible outcomes
• Event space: set of outcomes of interest
• Probability of an event is
– (size of event space)/(size of sample space)
• Counting: how many ways to pick k unique items
from a set of n items?
• Probability and counting
• Bernoulli trials
• Coin tossing example
• R function: rbinom
Basic stats
• Independent events: coin toss example
• Expected value of a random variable –
– example of Bernoulli and Binomal
• Variance of a random variable
• Correlation coefficient (same as Pearson
correlation coefficient)
• Formulas:
– Covariance(X,Y) = E((X-μX)(Y-μY))
– Correlation(X,Y)= Covariance(X,Y)/σXσY
– Pearson correlation
Probability distributions
• Binomial distribution
– sum of Bernoulli trials
– converges to Gaussian as number of trials approaches infinity
• R functions
– Reference card
– rbinom
– Repeated executions of rbinom
• Lists
• for loops
• sum, length
• Gaussian distribution
• Chi-square distribution
Limit theorems
• Law of large numbers: empirical mean
converges to true mean as we do more trials
(follows from Chebyshev’s and Markov’s
inequalities)
Law of large numbers
E(X)
P(X ³ a) £
a
• Markov’s inequality
Var(X)
• Chebyshev’s inequality P( X - E(X) ³ a) £ 2
a
• Law of large numbers: sample mean of n i.i.d.
random variables Xi converges to true one in
probability
• Can be proved by applying Chebyshev’s
inequality
X
Var(X) s
å
X=
, P( X - E(X) ³ e ) £
=
i
n
e2
2
ne 2
Limit theorems
• Central limit theorem: average of sampling
distribution converges to a normal distribution as
we do more trials. Specifically, it is normally
distributed with mean equal to the true mean μ
and standard deviation equal to σ/sqrt(n) where
n is number of trials and σ is true standard
deviation
• How is this useful? Consider modeling the mean
height of NJ residents. Can we assume it is
normally distributed due to Central Limit
Theorem?