Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Law of large numbers wikipedia , lookup
Misuse of statistics wikipedia , lookup
Ch. 17 Basic Statistical Models CIS 2033: Computational Probability and Statistics Prof. Longin Jan Latecki Prepared by: Nouf Albarakati Basic Statistical Models Random samples Statistical models Distribution features and sample statistics Estimating features of the “true” distribution Linear regression model Random samples A random sample is a collection of random variables X1, . . . , Xn, that have the same probability distribution and are mutually independent If F is a distribution function of each random variable Xi in a random sample, we speak of a random sample from F. Similarly we speak of a random sample from a density f, a random sample from an N(µ, σ2) distribution, etc An Example of Random sample From the properties of the Poisson process, the inter-failure times are independent and have the same exponential distribution Hence the software data is modeled as the realization of a random sample from an exponential distribution In some cases we may not be able to specify the type of distribution Statistical Models For Repeated Measurements A dataset consisting of values x1, x2,...,xn of repeated measurements of the same quantity is modeled as the realization of a random sample X1, X2,...,Xn The model may include a partial specification of the model distribution, the probability distribution of each Xi A Sample Statistic A sample statistic is a random object h(X1,X2,…,Xn), which depends on the random sample X1,X2, …, Xn only e.g., sample mean, sample median, etc - An object, h(x1,x2,…,xn) is a realization of corresponding sample statistic h(X1,X2,…,Xn) since the dataset x1,x2, …, xn is modeled as a realization of random sample X1,X2, …, Xn Sample Statistics The sample statistics corresponding to the empirical summaries should somehow reflect corresponding features of the model distribution The law of large numbers: , for every For large sample size n, the sample mean of most realizations of the random sample is close to the expectation of the corresponding distribution For instance, in a physical experiment, one usually thinks of each measurement as measurement = quantity of interest + measurement error Distribution Features and Sample Statistics Let X1,X2, . . . , Xn be a random sample from distribution function F, and the empirical distribution function of the sample is: for every ε > 0, This means that for most realizations of the random sample the empirical distribution function Fn is close to F Distribution Features and Sample Statistics The histogram and the kernel density estimate: another consequence of the law of large numbers: Hn(x)= Hn(x)= Similarly, the kernel density estimate of a random sample approximates the corresponding probability density f It should be noted that with a smaller dataset the similarity can be much worse. Distribution Features and Sample Statistics The sample mean, sample median, and empirical quantiles (According to the law of large numbers): expectation : 𝑋𝑛 ≈ μ the pth empirical quantile The sample variance and standard deviation, and the MAD Relative frequencies Distribution Features Estimating Features of the “true” Distribution we have a dataset of n elements that is modeled as the realization of a random sample with a probability distribution that is unknown to us. Our goal is to use our dataset to estimate a certain feature of this distribution that represents the quantity of interest. Linear Regression Model hardness = g(density of timber) hardness = g(density of timber) + random fluctuation hardness = α + β・ (density of timber) + random fluctuation This is a loose description of a simple linear regression model