Download Introduction and Some R

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction
3
2
1
0
1
2
3
Osborn
“Legal” Science
• Daubert is a benchmark!!!:
• Daubert (1993)- Judges are the “gatekeepers” of
scientific evidence.
• Must determine if the science is reliable
• Has empirical testing been done?
• Falsifiability
• Has the science been subject to peer review?
• Are there known error rates?
• Is there general acceptance?
• Frye Standard (1928) essentially
• Federal Government and 26(-ish) States are
“Daubert States”
Measurement and Randomness
• Any time an observation is made, one is making a
“measurement”
1. Experimental error is inherent in every
measurement
•
Refers to variation in observations between
repetitions of the same experiment.
•
It is unavoidable and many sources contribute
2. Error in a statistical context is a technical termBHH
Measurement and Randomness
• Experimental error is a form of randomness
•
Randomness: inherent unpredictability in a
process
•
The the outcomes of the process follow a probability
distribution
• Statistical tools are used to both:
•
•
Describe the randomness
Make inferences taking into account the
randomness
• Careful!:
•
Bad data, assumptions and models lead to
garbage (GIGO)
Probability
• Frequency: ratio of the number of
observations of interest (ni) to the total
number of observations (N)
ni
frequency of observation i =
N
•
Probability (frequentist): frequency of
observation i in the limit of a very large number
of observations
• We will almost always use this definition
•
It is EMPIRICAL!
Frequency
Roll a “fair” die 20 times (N = 20). What is the
frequency of obtaining 2 (n2 = ?)?
Let’s do this with simulation (Monte Carlo):
In R:
Result:
n2 = 2
freq2 = 2/20 = 0.1
What is Statistics??
• Study of relationships in data
• Descriptive Statistics – techniques to summarize data
• E.g. mean, median, mode, range, standard deviation, stem
and leaf plots, histograms, box and whiskers plots, etc.
• Inferential Statistics – techniques to draw conclusions
from a given data set taking into account inherent
randomness
• E.g. confidence intervals, hypothesis testing, Bayes’
theorem, forecasting, etc.
Data
• Random variables - All measurements have an
associated “randomness” component
• Randomness –patternless, unstructured,
typical, total ignoranceChaitin, Claude
• Any experiment/observation recorded is a
random variate
T.I.C. of Gasoline
Observations from the T.I.C.
o GC-MS instrument output for a gasoline :
Population and Sample
• Almost all of statistics is based on a sample
drawn from a population.
• Population: The totality of observations that might
occur as a result of repeatedly performing an
experiment
• Why not measure the whole population?
• Usually impossible
• Likely wasteful
• Population should be relevant.
• Part logic
• Part guess
• Part philosophy….
Data and Sampling
• Sample Representations:
Population
Representative
Sample
Biased
Samples
Population
Population
Sample
Sample
Parameters and Statistics
• Parameter: any function of the population
• Statistic: any function of a sample from the
population
•
•
Statistics are used to estimate population
parameters
•
Statistics can be biased or unbiased
•
Sample average is an unbiased estimator for
population mean
We may construct distributions for statistics
•
Populations have distributions for observations
•
Samples have distributions for observations and
statistics
What is
?
• R : A powerful Platform for Statistical Analysis
• Why bother learning R ?
• Basic Graphing
• Basic Data Summary and Analysis Tools
• Basic Statistical Inference Tools
• We will learn R and Rstudio
• Getting Help
• Basic input/output and calculating
• Visualizing with Graphing
Finding our way around R/RStudio
Handy
Commands:
• Basic Input and Output
Numeric input
x <- 4
variables:
store
information
:Assignment operator
x <- “text goes in quotes”
Text (character) input
Handy
Commands:
• Get help on an R command:
• If you know the name: ?command name
• ?plot brings up html on plot command
• If you don’t know the name:
• Use Google (my favorite)
• ??key word
Handy
Commands:
• R is driven by functions:
func(arguement1, argument2)
function name
input to function goes in parenthesis
function returns something; gets dumped into x
x <- func(arg1, arg2)
Handy
Commands:
• Matrices: X
• X[,1] returns column 1 of matrix X
• X[3,] returns row 3 of matrix X
• Handy functions for data frames and matrices:
• dim, nrow, ncol, rbind, cbind
• User defined functions syntax:
• func.name <- function(arguements) {
do something
return(output)
}
• To use it: func.name(values)
R commands not to forget for today
• <- (assignment or “gets”)
• ? (to get help with a command)
• :
(range operator)
• c (“collect”)
• sample
• seq (generate a sequence)
• plot
• library
• install.packages (to install libraries you don’t have)
• For matrices and vectors: x[,3] vs. x[3,] vs. x[,] vs.
x[3,3] vs. x[] vs. x[1:3] etc…