Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Single Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Gentle, JE (2002) Elements of Computational Statistics. Springer. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun). Questions of Interest About Single Samples • What is the mean value? • Is the mean value significantly different from expectation or theory? • What is the level of uncertainty associated with the estimate of the mean value? Facts Needed for Answers • Are the values normally distributed (bellshaped) or not? • Are there outliers in the data? • If the data were collected over a period of time, is there evidence for serial correlation? To use standard parametric tests, you need normal data, without outliers, and without serial correlation. Data Summary data<-read.table("das.txt",header=T) > names(data) [1] "y" > attach(data) > summary(y) Min. 1st Qu. Median Mean 3rd Qu. 1.904 2.241 2.414 2.419 2.568 > plot(y) Max. 2.984 plot(y) Querying your data y[50]<- 21.79386 plot(y) which(y>10) 50 y[50]<-2.179386 boxplot(y,ylab="data values”) Results Normal Distribution • The Central Limit Theorem implies anything produced by adding a large number of random samples (such as the mean) is normally distributed. – dnorm(z) is the normal distribution, with mean 0.0 and standard deviation (i.e., √variance) of 1.0. (z here is the standard unit for the normal distribution) – pnorm(x) is the probability of a z value of x or less. – qnorm(c(p1,p2)) gives the corresponding values of z that produce the probabilities of p1and p2 Plots for Testing Normality • The simplest and often the best test of normality is the quantile-quantile plot – qqnorm(y) – qqline(y, lty=2) • If the resulting plot shows a marked S-shape, it indicates non-normality. You’ve already seen this demonstrated. • If the data are non-normal, use Wilcoxon's signed rank test (wilcox.test) rather than Student's t-test (t.test) Inference • Demonstration with speed of light data • Another way to test this is bootstrapping – Demonstration • Demonstration of Student's t – dt(z,df) – pt(z,df) – qt(c(p,q),df) • Comparison between Student's t and normal distributions. Skew • Dimensionless version of the third moment about the mean. m3 = Sum(y-ymean)3/n s3 = (√s2)3 skew = 1 = m3/s3 • Measures the extent to which the distribution has a tail on one or the other side. • Demo of skew test. Kurtosis • Dimensionless version of the fourth moment about the mean. m4 = Sum(y-ymean)4/n s4 = (s2)2 kurtosis = 2 = m4/s4 -3 • Measures the extent to which the distribution is peaky or flat-topped. • Demo of kurtosis test. Conclusions • A generalisation of these individual tests is the Kolmogorov-Smirnov test (ks.test), which is usually used to compare two distributions. • If variance was ill-behaved, skew and kurtosis are worse. • We've seen ways of testing for normality and outliers. Serial correlation will be discussed when we learn about analysis of variance.