Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction 3 2 1 0 1 2 3 Osborn “Legal” Science • Daubert is a benchmark!!!: • Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. • Must determine if the science is reliable • Has empirical testing been done? • Falsifiability • Has the science been subject to peer review? • Are there known error rates? • Is there general acceptance? • Frye Standard (1928) essentially • Federal Government and 26(-ish) States are “Daubert States” Measurement and Randomness • Any time an observation is made, one is making a “measurement” 1. Experimental error is inherent in every measurement • Refers to variation in observations between repetitions of the same experiment. • It is unavoidable and many sources contribute 2. Error in a statistical context is a technical termBHH Measurement and Randomness • Experimental error is a form of randomness • Randomness: inherent unpredictability in a process • The the outcomes of the process follow a probability distribution • Statistical tools are used to both: • • Describe the randomness Make inferences taking into account the randomness • Careful!: • Bad data, assumptions and models lead to garbage (GIGO) Probability • Frequency: ratio of the number of observations of interest (ni) to the total number of observations (N) ni frequency of observation i = N • Probability (frequentist): frequency of observation i in the limit of a very large number of observations • We will almost always use this definition • It is EMPIRICAL! Frequency Roll a “fair” die 20 times (N = 20). What is the frequency of obtaining 2 (n2 = ?)? Let’s do this with simulation (Monte Carlo): In R: Result: n2 = 2 freq2 = 2/20 = 0.1 What is Statistics?? • Study of relationships in data • Descriptive Statistics – techniques to summarize data • E.g. mean, median, mode, range, standard deviation, stem and leaf plots, histograms, box and whiskers plots, etc. • Inferential Statistics – techniques to draw conclusions from a given data set taking into account inherent randomness • E.g. confidence intervals, hypothesis testing, Bayes’ theorem, forecasting, etc. Data • Random variables - All measurements have an associated “randomness” component • Randomness –patternless, unstructured, typical, total ignoranceChaitin, Claude • Any experiment/observation recorded is a random variate T.I.C. of Gasoline Observations from the T.I.C. o GC-MS instrument output for a gasoline : Population and Sample • Almost all of statistics is based on a sample drawn from a population. • Population: The totality of observations that might occur as a result of repeatedly performing an experiment • Why not measure the whole population? • Usually impossible • Likely wasteful • Population should be relevant. • Part logic • Part guess • Part philosophy…. Data and Sampling • Sample Representations: Population Representative Sample Biased Samples Population Population Sample Sample Parameters and Statistics • Parameter: any function of the population • Statistic: any function of a sample from the population • • Statistics are used to estimate population parameters • Statistics can be biased or unbiased • Sample average is an unbiased estimator for population mean We may construct distributions for statistics • Populations have distributions for observations • Samples have distributions for observations and statistics What is ? • R : A powerful Platform for Statistical Analysis • Why bother learning R ? • Basic Graphing • Basic Data Summary and Analysis Tools • Basic Statistical Inference Tools • We will learn R and Rstudio • Getting Help • Basic input/output and calculating • Visualizing with Graphing Finding our way around R/RStudio Handy Commands: • Basic Input and Output Numeric input x <- 4 variables: store information :Assignment operator x <- “text goes in quotes” Text (character) input Handy Commands: • Get help on an R command: • If you know the name: ?command name • ?plot brings up html on plot command • If you don’t know the name: • Use Google (my favorite) • ??key word Handy Commands: • R is driven by functions: func(arguement1, argument2) function name input to function goes in parenthesis function returns something; gets dumped into x x <- func(arg1, arg2) Handy Commands: • Matrices: X • X[,1] returns column 1 of matrix X • X[3,] returns row 3 of matrix X • Handy functions for data frames and matrices: • dim, nrow, ncol, rbind, cbind • User defined functions syntax: • func.name <- function(arguements) { do something return(output) } • To use it: func.name(values) R commands not to forget for today • <- (assignment or “gets”) • ? (to get help with a command) • : (range operator) • c (“collect”) • sample • seq (generate a sequence) • plot • library • install.packages (to install libraries you don’t have) • For matrices and vectors: x[,3] vs. x[3,] vs. x[,] vs. x[3,3] vs. x[] vs. x[1:3] etc…