Download Analyzing Data

Document related concepts

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Data Analysis for AP
Biology
Data Analysis of the New AP
Test
• You will have at least one “Lab set” of
questions for data analysis in the multiple
choice section of your AP Tests- usually 5
questions
• You will also have 6 grid-in questions in the
multiple choice section of your lab that
require mathematical computations
• All AP questions are tied to Learning
Objectives and Science practices.
There are 7 Science Practices. Two of these pertain to Data
Analysis.
Science Practice 2: The student can use
mathematics appropriately.
Science Practice 5: The student can perform data
analysis and evaluation of evidence.
A practice is a way to coordinate knowledge and skills in order to
accomplish a goal or task. The science practices enable students to
establish lines of evidence and use them to develop and refine
testable explanations and predictions of natural phenomena.
Ways to analyze data (evidence)
include performing mathematical
functions such as
Graphing
 Statistical analysis of data
 Evaluating the experimental
design or data set (quantitative
reasoning)
Application of Quantitative Reasoning
• Requires skills such as
• Mathematical routines
• Concepts
• Methods
• Operations used to interpret data,
solve problems, make decisions
• Application of your math skills
The Counting/Measuring/ Calculating portion of
Quantitative Reasoning includes simple
calculations
• Percentages
• Ratios
• Averages
• Means
Percentages
Percent change in Mass: used to standardize
the comparison since starting masses may vary
between groups start
% change = final mass – initial mass
initial mass
X 100
This was from the diffusion and osmosis
lab.
Initial
Mass (g)
Final Mass % Change
(g)
in Mass
Red Bag
3.8
4.0
Orange Bag
4.1
4.3
Yellow Bag
4.6
4.8
Blue Bag
4.5
4.5
Purple Bag
4.3
4.4
% change = final mass – initial mass
initial mass
X 100
Initial
Mass (g)
Final Mass % Change
(g)
in Mass
Red Bag
3.8
4.0
5.2%
Orange Bag
4.1
4.3
4.8%
Yellow Bag
4.6
4.8
4.3%
Blue Bag
4.5
4.5
0%
Purple Bag
4.3
4.4
2.3%
% change = final mass – initial mass
initial mass
X 100
Analyze this graph.
It had always been assumed that eukaryotic genes were similar in organization to prokaryotic genes.
However, modern techniques of molecular analysis indicated that there are additional DNA sequences
that lie within the coding region of genes. Exons are the DNA sequences that code for proteins while introns
are the intervening sequences that have to be removed.
The graph shows the number of exons found in genes for three different groups of eukaryotes.
Percentage of genes
100
80
Saccharomyces cerevisiae (a yeast)
60
40
20
0
40
30
Drosophila melanogaster (fruit fly)
20
10
0
20
15
Mammals
10
5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 <30<40<60>60
Number of exons
Calculate the percentage of genes that have five or less exons in mammals.
Percentage
of genes
7 + 7 + 10 + 10 + 15 = 49
Ratios
Can appear as probabilities in Genetics problems
Law of multiplication
Independent events in sequence
“and”
In a cross between AaBbCc x AaBBCC, what is
the probability that the offspring will be AaBbCC?
This is an “and” question – all events happening at
the same time sooo
½ x ½ x ½ = 1/8
Look at each cross separately- as though you
were using only one trait at a time:
In a cross between AaBbCc x AaBBCC, what is
the probability that the offspring will be
AaBbCC?
Step 1: If you cross Aa x Aa, what is the
probability that you will get Aa?
2/4 which reduces to ½
A
A
a
a
AA
Aa
Aa
aa
In a cross between AaBbCc x AaBBCC, what is
the probability that the offspring will be
AaBbCC?
Step 2: What is the probability that you will get
Bb when you cross Bb with BB?
2/4 which reduces to ½
B
b
B
BB
Bb
B
BB
Bb
In a cross between AaBbCc x AaBBCC, what is the
probability that the offspring will be AaBbCC?
Step 3: What is the probability that you get
CC from a cross between Cc x CC?
2/4 reduces to ½
C
c
C
CC
Cc
C
CC
Cc
In a cross between AaBbCc x AaBBCC, what
is the probability that the offspring will be
AaBbCC ?
This is an “and” question – all events
happening at the same time sooo…
multiply
½ x ½ x ½ = 1/8
Law of addition
Mutually exclusive events “or” statements
If there are 2 ways to get the answer, add the
probabilities
Cross between two Pp to produce Pp offspring
2 ways to get Pp alleles from the parents
½ chance of getting P from mom and
½ chance of getting P from dad
½x ½=¼
½ chance of getting p from mom and
½ chance of getting p from dad
½x ½=¼
¼ + ¼ = 2/4 or 1/2
What is the chance that a cross between
AaBbCc x AaBBCC to produce offspring
AABbCc or AABBCC?
To get AABbCc :
¼ x ½ x ½ = 1/16
To get AABBCC:
¼ x ½ x ½ = 1/16
So 1/16 + 1/16 = 2/16 or 1/8
If expressed as 1:8 then it is a ratio.
The Second tier of Quantitative Reasoning is
Graphing/Mapping/Ordering
Graphs are used to recognize patterns or trends in data.
The most common graph types
• Bar graph is for distinct classes of data
• Line graph is for progressive series of data
• Scatterplots also known as Scattergrams
• Degree or tendency with which the variables occur in
association with each other
What type of graph? When do I draw a bar graph?
1. Histogram/Bar Graphs: the data are organized into bins
• You determine the number of the bins and their range
• Used to compare two samples of categorical or count
data
• May be used to calculate the means with error bars of
normal data
• ONLY when presented with categorical data (which
in AP Bio is almost NEVER)
• Examples of categorical data
 size of a population by age range
 Number of deaths by causes of death
 Size of different populations in an
ecosystem
Let’s try one.
Country
Algeria
Brazil
Hungary
Guatemala
HIV Prevalence in ages
15-49
1990
0.06
0.45
0.10
0.10
2009
0.10
0.45
0.06
0.60
HIV Prevelance in Ages 15-49
0.7
%
0.6
H
I
V 0.5
i
n 0.4
1990
a
g
0.3
e
s
2009
0.2
1
5
4 0.1
9
0
Algeria
Brazel
Hungary
Guatemala
Scatter Plots
Suppose that we want to graph the heights and weights of a group
of people.
Since both height and weight are variables, we use the phrase
bivariate data, meaning that there are two variables.
Bivariate data are best displayed on a scatter plot or scattergram.
 Each data point represents both an x value and a y value, so its
coordinates are (xylem). In our example, the coordinates of a point
are (weight, height).
 Do NOT connect the points. This is because each point represents
a
 particular fact. In our example, the “fact” is one person.
 After you plot all the points, look at them to see if there is a trend,
a pattern.
 If the points form a pattern that tends to rise, we say that there is
a positive correlation.
 If the points form a pattern that tends to fall, we say that
 there is a negative correlation.
 If the points do not show any organized pattern, there is no
correlation.
No correlation
Let’s do it. Graph these data.
2. Line Graphs
Appendix B of your lab Manual is entitled
Constructing Line Graphs
..\..\AP Biology\AP Labs
new\B_Construction Graphs.pdf
Let’s try one.
Average Cricket Chirps per Minute at Various
Temperatures
140
120
100
80
Number of Chirps
per minute
Snowy Tree Cricket
60
Common Field Cricket
40
20
0
0
5
10
15
20
25
Temperature (degrees C)
30
35
 AP requires specific items and
Y
OU KNOW HOW TO GRAPH…BUT
procedures in graphing
 Reviewing
these few concepts will
help you get these typically EASY
POINTS!
 So,
let’s check it out!
Preparing Graphs
 Provide a title to the graph that states exactly
what is being measured
 Label each axis
 Indicate on each axis what is being measured
and in what units
 Time (Min)
 Distance (meters)
 Water loss (mL/m2
 Provide values along each axis at regular
intervals that are uniform
 Select values and spacing that will allow your
graph to take up most of the space available
 Use the x-axis for the independent variable and
the y-axis for the dependent variable
 Plot your points and connect them. Do not use
a best fit curve unless told to do so!
 If you are asked to extrapolate beyond the know
data points, use a different line such as dashed
or dotted.
 If you are plotting more that one
condition or data set, use different lines
or symbols for each data set such as
circles, squares, triangles
 The graph should clarify whether the
data start at the origin (0,0) or not. The
line should not be extended to the origin
if it did not start there
1. R
2. 0
3. 1,5
What is wrong with this graph?
You will be able to draw from your AP
Lab experience in which questions and
problems are raised and solved during
your investigations.
Problem solving involves a complex
interplay among observation, theory,
and inference.
Data analysis describes your data quantitatively.
Descriptive statistics helps to pain a picture of the variation in your
data.
central tendencies
standard error
best-fit functions
confidence that you have collected enough data
Analyzing Data can be accomplished
in several ways
1. You can look for relationships,
patterns, and trends
2. Often you may have to subject
your data to statistical analysis
EXPERIMENTAL ERROR
Always error in any procedure
 More than likely it is sample size

Hard to do this in school setting so it is a limitation
to your data analysis
 You may not see the normal distribution if you had
more data to analyze

MEAN, SD AND SE
If the data has a normal distribution we can find
the mean, SD and SE
 Mean – summarizes the entire sample


If a large enough sample size is used it may estimate
the actual population’s mean
SD – STANDARD DEVIATION
 Measures
the spread (variance) in the sample
 Large SD indicated that the data have a lot of
variability
 Small SD indicates that the data are clustered
close to the sample mean

Equation
x = mean
 n = sample size
 xi = individual value

SE- STANDARD ERROR

Allows us make an inference about how well the
sample mean matches up to the true population
mean.
s = the sample SD
 n = the sample size


The larger the sample of the population, the
smaller the SE to the actual population.
TO SUM IT UP…..
standard error is an estimate of how close your
sample mean is to the actual population’s mean
 standard deviation is the degree to which
individuals within the sample differ from the
sample mean

you will not have to calculate this value
 you should understand what it tells you and
where it comes from

Statistical tests, such as chi-square, can be used to determine
the probability that your data are significantly different from a
theoretical population.
Statistical testing should be included in your experimental
design.
Chi-Square
• How well does experimental data fit what is
expected
• Used in many experiments where there are
at least 2 experimental groups
• In genetics the hypothesis is the expected
ratio of a genetic cross
• If it is an F1 monohybrid cross, than the
F2 will have a 3:1 phenotypic
• (dominant: recessive)
•Other applications of chi-square use the mean
or another know for the expected
Now we need an Ho or null hypothesis
The null hypothesis states
“There is no difference between the
expected (ratio) and the observed (ratio)”
A X2 analysis will help determine if the
difference between what you observed and
what you expected is statistically significant or
not.

F
ORMULA
Determine the expected ratio and the expected numbers for each group.

Collect the number of observed in each group.

Calculate the chi-square statistic using this formula.

Use the number of individuals and NOT proportions, ratios, or
frequencies.
(obs  exp)
 
exp
2
2

So what does it mean???
O = observed data
E = expected data
Σ = sum of…….
The equation is used for each group in the experiment, and
the values are added together
Example
F2 offspring : 290 purple
110 white
total of 400 (290 + 110) offspring.

We expect a 3: 1 ratio.
Calculate the expected numbers
Multiplying the total offspring by the expected proportions

This we expect 400 x 3/4 = 300 purple, and 400 x 1/4 = 100 white.





purple: obs = 290 and exp = 300
white: obs = 110 and exp = 100.
Now it's just a matter of plugging into the formula:
2 = (290 - 300)2 / 300 + (110 - 100)2 / 100
= (-10)2 / 300 + (10)2 / 100
= 100 / 300 + 100 / 100
= 0.333 + 1.000
= 1.333.

This is our chi-square value: now we need to see what it means and how to use
it.
WHAT DO WE COMPARE OUR COMPUTED CHI-SQUARE
TO?

Difference between the observed results and the expected results
is small enough that it would be seen at least 1 time in 20 over
thousands of experiments, we “fail to reject” the null hypothesis.

For technical reasons, we use “fail to reject” instead of “accept”.

“1 time in 20” can be written as a probability value p = 0.05,
because 1/20 = 0.05.

Another way of putting this is that only 5 % of the time this data
could be collected by chance
Degrees Of Freedom





Use “degrees of freedom”
Number of independent random variables
involved.
Degrees of freedom is simply the number of
classes of offspring minus 1.
For our example, there are 2 classes of offspring:
purple and white.
Degrees of freedom (df) = 2 -1 = 1.
Critical Value
 Find

Critical values for chi-square on tables
use p = 0.05 and correct df
 If
your calculated chi-square value is greater
than the critical value from the table, you
“reject the null hypothesis”.
 If
your chi-square value is less than the
critical value, you “fail to reject” the null
hypothesis (that is, you accept that your
genetic theory about the expected ratio is
correct).
Chi-Square Table
USING THE TABLE
 In
our example of 290 purple to 110 white,
we calculated a chi-square value of 1.333,
with 1 degree of freedom.
 Looking at the table, 1 d.f. is the first row,
and p = 0.05 is the sixth column. Here we
find the critical chi-square value, 3.841.
 Since our calculated chi-square, 1.333, is
less than the critical value, 3.841, we “fail
to reject” the null hypothesis. Thus, an
observed ratio of 290 purple to 110 white
is a good fit to a 3/4 to 1/4 ratio.
ANOTHER EXAMPLE: FROM MENDEL
phenotype
observed
315
expected
proportion
9/16
expected
number
312
round
yellow
round
green
wrinkled
yellow
wrinkled
green
total
101
3/16
104
108
3/16
104
32
1/16
34
556
1
556
Find the Expected Numbers




You are given the observed numbers, and you
determine the expected proportions from a
Punnett square.
To get the expected numbers of offspring, first
add up the observed offspring to get the total
number of offspring. In this case, 315 + 101 +
108 + 32 = 556.
Then multiply total offspring by the expected
proportion:
--expected round yellow = 9/16 x 556 = 312
--expected round green = 3/16 x 556 = 104
--expected wrinkled yellow = 3/16 x 556 = 104
--expected wrinkled green = 1/16 x 556 = 34
These add up to 556, the observed total offspring.
CALCULATING THE CHI-SQUARE
VALUE
Use the formula.
 X2 = (315 - 312.75)2 / 312.75
+ (101 - 104.25)2 / 104.25
+ (108 - 104.25)2 / 104.25
+ (32 - 34.75)2 / 34.75

= 0.016 + 0.101 + 0.135 + 0.218
= 0.470.
(obs  exp)
 
exp
2
2
df = 3
Critical value = 7.815
X2 = 0.470
X2 < 7.81 so we accept our null hypothesis
There is no statistical difference between our
expected and our observed so our hypohteseis
that we used to form our Punnett square is
correct.
Compare your computed chi-square to the
critical value
Critical values for chi-square are found on tables, sorted by
degrees of freedom and probability levels.
Use p = 0.05.
If your calculated chi-square value is greater than the critical
value from the table, you “reject the null hypothesis”.
If your chi-square value is less than the critical value, you “fail
to reject” the null hypothesis (that is, you accept that your
genetic theory about the expected ratio is correct).
For any lab scenario you should be able
to
 Identify the IV and DV and know the appropriate
units
 Describe the experimental treatment
 Identify the control or controls
 Know that replicas should exist for each treatment
and that subjects should be randomly chosen
 Identify what the constants are
 Be able to form a hypothesis or identify the
hypothesis
 Draw a conclusion from a data set
IV and DV
You are investigating how the crustacean
Daphnia responds to changes in temperature.
You expose Daphnia to temperatures of 5◦C,
10◦ C, 15◦ C, 20◦ C, and 30◦ C. You count the
number of heartbeats/sec in each case.

Temperature is the independent
variable(you are manipulating it)

Number of heartbeats/sec is the
dependent variable(you observe how it
changes in response to different
temperatures).
Use only one independent variable.
 Only one independent variable can be
tested at a time.
 If you manipulate two independent
variables at the same time, you cannot
determine which is responsible for the
effect you measure in the dependent
variable.
 In the physiological experiment, if the
subject also drinks coffee in addition to
exercising, you cannot determine which
treatment, coffee or exercise, causes a
change in blood pressure.
IV and DV
You design an experiment to investigate the effect of
exercise on pulse rate and blood pressure.
The physiological conditions (independent variable, or
variable you manipulate) include sitting, exercising, and
recovery at various intervals following exercise.
You make two kinds of measurements (two dependent
variables) to evaluate the effect of the physiological
conditions
 pulse rate and blood pressure are the dependent
variables
 Time is the Independent variable.
Identify a control treatment.
The control treatment, or control, is the
independent variable at some normal or
standard value.
The results of the control are used for
comparison with the results of the
experimental treatments.
Identify a control treatment.
 In the Daphnia experiment, you choose
the temperature of 20◦C as the control
because that is the average
temperature of the pond where you
obtained the culture.
 In the experiment on physiological
conditions, the control is sitting, when
the subject is not influenced by
exercising.
Describe the experimental treatment.
The experimental treatment (or treatments)
are the various values that you assign to the
independent variable.
The experimental treatments describe how you
are manipulating the independent variable.
In the Daphnia experiment, the experimental
treatments were the temperatures of 5◦C,
10◦C, 15◦C, 20◦C, and 30◦C.
Random sample of subjects.
 You must choose the subjects for your
experiments randomly.
 Since you cannot evaluate every
Daphnia, you must choose a
subpopulation to study.
 If you choose only the largest Daphnia
to study, it is not a random sample, and
you introduce another variable (size) for
which you cannot account.
Identify the constants
The characteristics that remain the same
are constants.
Example, the number of donuts in a dozen
are always 12.
Describe the procedure.
 Describe how you will set up the
experiment.
 Identify equipment and chemicals to be
used and why you are choosing to use them.
 If appropriate, provide a labeled drawing of
the setup.
SEEDS GIVEN VARIOUS TREATMENTS ARE PLANTED IN
SMALL POTS. THE GRAPH BELOW IS AN ILLUSTRATION
OF THE DATA OBTAINED.
At day 8 of the experiment, all of
the following statements are correct
EXCEPT:
A.
The T, DG, and DGA seedlings
are very similar in height.
B.
The eventual greater height of
the T seedlings over the DG
seedlings is already
predictable.
C.
The D seedlings are less than
half as tall as the other
seedlings.
D.
The DG seedlings are taller
than the T seedlings.
D = Dwarf pea plant seeds – no
treatment
DG = D.P.P.S. soaked in gibberellins
DGA = D.P.P.S. soaked in gibberellins
and auxin
T = Tall, nondwarf pea plant seeds – no
treatment
IN COUNTRY 1, APPROXIMATELY WHAT PERCENTAGE
OF THE INDIVIDUALS WERE YOUNGER THAN FIFTEEN
YEARS OF AGE?
A.
B.
C.
D.
10%
21%
42%
52 %
WHICH OF THE FOLLOWING BEST APPROXIMATES
THE RATIO OF MALES TO FEMALES AMONG
INDIVIDUALS BELOW FIFTEEN YEARS OF AGE?
A.
B.
C.
D.
Country 1
1:1
0.75 : 1
0.5 : 1
1:1
Country 2
1:1
0.75 : 1
0.5 : 1
0.5 : 1
IF, IN COUNTRY 1, INFANT MORTALITY DECLINED
AND THE BIRTH RATE REMAINED THE SAME, THEN
INITIALLY THE POPULATION WOULD BE EXPECTED
TO
A.
B.
C.
D.
be more evenly distributed
among the age classes
be even more concentrated in
the young age classes
stabilize at the illustrated
level for all age classes
increase in the oldest age
classes
OVER THE NEXT 10-15 YEARS, THE STABILIZATION
OF COUNTRY 1’S POPULATION AT ITS CURRENT SIZE
WOULD REQUIRE THAT
A.
B.
C.
D.
infant mortality be reduced to about half the present
level
the death rate be reduced drastically
each couple produce fewer children than the number
required to replace themselves
about 15 years be added to the life expectancy of each
person
A wild-type fruit fly (heterozygous for gray body color and normal wings
was mated with a black fly with vestigial wings. The offspring had the
following phenotypic distribution: wild type, 778; black-vestigial, 785; blacknormal, 158; gray-vestigial, 162. What is the recombination frequency
between these genes for body color and wing type.
First count the total number of offspring 778+785+158+162 = 1883
In all dihybrid test crosses (a cross between a known heterozygote for two
wild type traits and a homozygous recessive individual for both traits) the
expected ratio of phenotypes if the genes are on separate chromosomes must
be:
wild type, 25%; black-vestigial, 25% black-normal, 25%; gray-vestigial, 25%.
These results do not fit the experimental data above (778+785+158+162).
In fact the black-normal (158) and gray-vestigial (162) offspring represent
recombinant individuals.
Calculation of recombination frequency:
Recombination frequency = 17%