Download Types of data, distributions - Department of Environmental Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardware random number generator wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Theoretical computer science wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Christine Mayer
Contact Info:
Office: 158 LEC; 3086 C Bowman-Oddy
(419) 530-8377 LEC, -5470 B.O.
[email protected]
Office Hours:
by appointment any time on Wed & Fri on campus or
Mon, Tue, or Thur at LEC
SAS available
3rd floor computer lab BO, one common computer @
LEC
Rumor has it that, except for Darwin's The origin of the species,
Biometry was the most cited text in evolutionary biology. A check
using the online version of the Science Citation Index revealed
that citations to Biometry far exceeded those for The origin of the
species. From 1976 to mid 1997 the following counts were
obtained: Darwin (all publications) = 7,111. Sokal and Rohlf
Biometry = 31,757.
From text book web page……
Class Schedule is tentative
Some topics may change or move due to time constraints
GRADING
Three exams: @ 20% a piece (60% total): essay, fill in
blank, short answer, diagrams & graph interpretation,
other. Open book, but no sharing of any materials during
the exam.
In class presentations, quizzes (unannounced), and inclass assignments 20%
Homework, 20% (each assignment grade reduced by
50% per day late, unless prior arrangement made)
Student Introductions
Name
Discipline / Department
Year in grad school
Project (if known)
Prior statistics classes / experience
Computer / programming experience
Are you “quantaphobic”?
Statistics: analysis and interpretation of data (plural)
with a view toward objective evaluation of the
reliability of the conclusions based on the data.
The important part is the conclusion or answer to the question
Term originally derived from study of government
and state affairs
Probabilistic vs. Deterministic World
Karl Popper's book The Open Universe: An Argument For
Indeterminism defines scientific determinism as the claim that
...any event can be rationally predicted, with any desired degree
of precision, if we are given a sufficiently precise description of
past events, together with all the laws of nature,
In theology and philosophy, probabilism (from Latin
probare, to test, approve) holds that in the absence of
certainty, probability is the best criterion.
Probablism is summed up in the Latin phrase ‘ubi dubium
libertas’ (where there is doubt there is freedom).
Biology = Variation
Many causal factors, some unknown.
But…..if we knew all the factors would biological
responses really be completely explainable and follow
laws like physics?
Statistics measures variable (biological and other)
phenomenon with predictable error, explains
differences.
Individual observation
(not always the same as individual animal, plant, etc…)
Sample (collection of individual observations)
Population
(usually what we want to know about)
Sampling: so we don’t have to count or measure them all!!
Population/universe: the entire group about which you want
to draw conclusions (very important-remember for discussion on pseudoreplication)
ex. All the yellow perch in Lake Erie, all women in the
USA
Sample: subset that you measure to draw conclusions on
the population
Random: each member of population has an
equal and independent chance of being selected
Simple random, stratified random, cluster, etc……
Variable: A property that varies among the
individuals sampled
Variate: single observation of a variable
Types of Data
Continuous: a possible value between any other 2 values
ex. Height, weight, volume
Discrete (meristic): only certain values possible
ex. Number of leaves on a plant, eggs in a nest,
ratio of wings/legs on an insect
Types of Data
Ranked variable: ordered or ranked measurements
ex. Small-med-large; A, B, C, D, F
less information than ratio or interval data
amenable to certain stat. techniques
Attribute variable: non-ordered categories
ex. Blonde-brunette-redhead; spruce-pine-fir
few stat. techniques
 can be combined with frequencies
Accuracy: closeness of measurement to “true” value
Precision: closeness of repeated measurements to eachother
Read S & R 2.3 and 2.4
Frequency Distributions
(quantitative or qualitative classes)
Bar graph (from table)
Categories not numerical
Axis labels
Scale
Survey of 40
cafeteria diners
16
15
14
14
12
13
Number of People
Number of People
Survey of 40
cafeteria diners
10
8
6
4
2
12
11
10
9
8
7
0
6
cherios
mac and
cheese
mistery meat
Food being consumed
salad
cherios
mac and
cheese
mistery meat
Food being consumed
salad
- With continuous data must select categories (~~10-20)
-Can show midpoint or range
-Pg 25 for rules
http://www.nwfsc.noaa.gov/publications/survey/2001/2001fig1.html