Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Analysis for AP Biology Data Analysis of the New AP Test • You will have at least one “Lab set” of questions for data analysis in the multiple choice section of your AP Tests- usually 5 questions • You will also have 6 grid-in questions in the multiple choice section of your lab that require mathematical computations • All AP questions are tied to Learning Objectives and Science practices. There are 7 Science Practices. Two of these pertain to Data Analysis. Science Practice 2: The student can use mathematics appropriately. Science Practice 5: The student can perform data analysis and evaluation of evidence. A practice is a way to coordinate knowledge and skills in order to accomplish a goal or task. The science practices enable students to establish lines of evidence and use them to develop and refine testable explanations and predictions of natural phenomena. Ways to analyze data (evidence) include performing mathematical functions such as Graphing Statistical analysis of data Evaluating the experimental design or data set (quantitative reasoning) Application of Quantitative Reasoning • Requires skills such as • Mathematical routines • Concepts • Methods • Operations used to interpret data, solve problems, make decisions • Application of your math skills The Counting/Measuring/ Calculating portion of Quantitative Reasoning includes simple calculations • Percentages • Ratios • Averages • Means Percentages Percent change in Mass: used to standardize the comparison since starting masses may vary between groups start % change = final mass – initial mass initial mass X 100 This was from the diffusion and osmosis lab. Initial Mass (g) Final Mass % Change (g) in Mass Red Bag 3.8 4.0 Orange Bag 4.1 4.3 Yellow Bag 4.6 4.8 Blue Bag 4.5 4.5 Purple Bag 4.3 4.4 % change = final mass – initial mass initial mass X 100 Initial Mass (g) Final Mass % Change (g) in Mass Red Bag 3.8 4.0 5.2% Orange Bag 4.1 4.3 4.8% Yellow Bag 4.6 4.8 4.3% Blue Bag 4.5 4.5 0% Purple Bag 4.3 4.4 2.3% % change = final mass – initial mass initial mass X 100 Analyze this graph. It had always been assumed that eukaryotic genes were similar in organization to prokaryotic genes. However, modern techniques of molecular analysis indicated that there are additional DNA sequences that lie within the coding region of genes. Exons are the DNA sequences that code for proteins while introns are the intervening sequences that have to be removed. The graph shows the number of exons found in genes for three different groups of eukaryotes. Percentage of genes 100 80 Saccharomyces cerevisiae (a yeast) 60 40 20 0 40 30 Drosophila melanogaster (fruit fly) 20 10 0 20 15 Mammals 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 <30<40<60>60 Number of exons Calculate the percentage of genes that have five or less exons in mammals. Percentage of genes 7 + 7 + 10 + 10 + 15 = 49 Ratios Can appear as probabilities in Genetics problems Law of multiplication Independent events in sequence “and” In a cross between AaBbCc x AaBBCC, what is the probability that the offspring will be AaBbCC? This is an “and” question – all events happening at the same time sooo ½ x ½ x ½ = 1/8 Look at each cross separately- as though you were using only one trait at a time: In a cross between AaBbCc x AaBBCC, what is the probability that the offspring will be AaBbCC? Step 1: If you cross Aa x Aa, what is the probability that you will get Aa? 2/4 which reduces to ½ A A a a AA Aa Aa aa In a cross between AaBbCc x AaBBCC, what is the probability that the offspring will be AaBbCC? Step 2: What is the probability that you will get Bb when you cross Bb with BB? 2/4 which reduces to ½ B b B BB Bb B BB Bb In a cross between AaBbCc x AaBBCC, what is the probability that the offspring will be AaBbCC? Step 3: What is the probability that you get CC from a cross between Cc x CC? 2/4 reduces to ½ C c C CC Cc C CC Cc In a cross between AaBbCc x AaBBCC, what is the probability that the offspring will be AaBbCC ? This is an “and” question – all events happening at the same time sooo… multiply ½ x ½ x ½ = 1/8 Law of addition Mutually exclusive events “or” statements If there are 2 ways to get the answer, add the probabilities Cross between two Pp to produce Pp offspring 2 ways to get Pp alleles from the parents ½ chance of getting P from mom and ½ chance of getting P from dad ½x ½=¼ ½ chance of getting p from mom and ½ chance of getting p from dad ½x ½=¼ ¼ + ¼ = 2/4 or 1/2 What is the chance that a cross between AaBbCc x AaBBCC to produce offspring AABbCc or AABBCC? To get AABbCc : ¼ x ½ x ½ = 1/16 To get AABBCC: ¼ x ½ x ½ = 1/16 So 1/16 + 1/16 = 2/16 or 1/8 If expressed as 1:8 then it is a ratio. The Second tier of Quantitative Reasoning is Graphing/Mapping/Ordering Graphs are used to recognize patterns or trends in data. The most common graph types • Bar graph is for distinct classes of data • Line graph is for progressive series of data • Scatterplots also known as Scattergrams • Degree or tendency with which the variables occur in association with each other What type of graph? When do I draw a bar graph? 1. Histogram/Bar Graphs: the data are organized into bins • You determine the number of the bins and their range • Used to compare two samples of categorical or count data • May be used to calculate the means with error bars of normal data • ONLY when presented with categorical data (which in AP Bio is almost NEVER) • Examples of categorical data size of a population by age range Number of deaths by causes of death Size of different populations in an ecosystem Let’s try one. Country Algeria Brazil Hungary Guatemala HIV Prevalence in ages 15-49 1990 0.06 0.45 0.10 0.10 2009 0.10 0.45 0.06 0.60 HIV Prevelance in Ages 15-49 0.7 % 0.6 H I V 0.5 i n 0.4 1990 a g 0.3 e s 2009 0.2 1 5 4 0.1 9 0 Algeria Brazel Hungary Guatemala Scatter Plots Suppose that we want to graph the heights and weights of a group of people. Since both height and weight are variables, we use the phrase bivariate data, meaning that there are two variables. Bivariate data are best displayed on a scatter plot or scattergram. Each data point represents both an x value and a y value, so its coordinates are (xylem). In our example, the coordinates of a point are (weight, height). Do NOT connect the points. This is because each point represents a particular fact. In our example, the “fact” is one person. After you plot all the points, look at them to see if there is a trend, a pattern. If the points form a pattern that tends to rise, we say that there is a positive correlation. If the points form a pattern that tends to fall, we say that there is a negative correlation. If the points do not show any organized pattern, there is no correlation. No correlation Let’s do it. Graph these data. 2. Line Graphs Appendix B of your lab Manual is entitled Constructing Line Graphs ..\..\AP Biology\AP Labs new\B_Construction Graphs.pdf Let’s try one. Average Cricket Chirps per Minute at Various Temperatures 140 120 100 80 Number of Chirps per minute Snowy Tree Cricket 60 Common Field Cricket 40 20 0 0 5 10 15 20 25 Temperature (degrees C) 30 35 AP requires specific items and Y OU KNOW HOW TO GRAPH…BUT procedures in graphing Reviewing these few concepts will help you get these typically EASY POINTS! So, let’s check it out! Preparing Graphs Provide a title to the graph that states exactly what is being measured Label each axis Indicate on each axis what is being measured and in what units Time (Min) Distance (meters) Water loss (mL/m2 Provide values along each axis at regular intervals that are uniform Select values and spacing that will allow your graph to take up most of the space available Use the x-axis for the independent variable and the y-axis for the dependent variable Plot your points and connect them. Do not use a best fit curve unless told to do so! If you are asked to extrapolate beyond the know data points, use a different line such as dashed or dotted. If you are plotting more that one condition or data set, use different lines or symbols for each data set such as circles, squares, triangles The graph should clarify whether the data start at the origin (0,0) or not. The line should not be extended to the origin if it did not start there 1. R 2. 0 3. 1,5 What is wrong with this graph? You will be able to draw from your AP Lab experience in which questions and problems are raised and solved during your investigations. Problem solving involves a complex interplay among observation, theory, and inference. Data analysis describes your data quantitatively. Descriptive statistics helps to pain a picture of the variation in your data. central tendencies standard error best-fit functions confidence that you have collected enough data Analyzing Data can be accomplished in several ways 1. You can look for relationships, patterns, and trends 2. Often you may have to subject your data to statistical analysis EXPERIMENTAL ERROR Always error in any procedure More than likely it is sample size Hard to do this in school setting so it is a limitation to your data analysis You may not see the normal distribution if you had more data to analyze MEAN, SD AND SE If the data has a normal distribution we can find the mean, SD and SE Mean – summarizes the entire sample If a large enough sample size is used it may estimate the actual population’s mean SD – STANDARD DEVIATION Measures the spread (variance) in the sample Large SD indicated that the data have a lot of variability Small SD indicates that the data are clustered close to the sample mean Equation x = mean n = sample size xi = individual value SE- STANDARD ERROR Allows us make an inference about how well the sample mean matches up to the true population mean. s = the sample SD n = the sample size The larger the sample of the population, the smaller the SE to the actual population. TO SUM IT UP….. standard error is an estimate of how close your sample mean is to the actual population’s mean standard deviation is the degree to which individuals within the sample differ from the sample mean you will not have to calculate this value you should understand what it tells you and where it comes from Statistical tests, such as chi-square, can be used to determine the probability that your data are significantly different from a theoretical population. Statistical testing should be included in your experimental design. Chi-Square • How well does experimental data fit what is expected • Used in many experiments where there are at least 2 experimental groups • In genetics the hypothesis is the expected ratio of a genetic cross • If it is an F1 monohybrid cross, than the F2 will have a 3:1 phenotypic • (dominant: recessive) •Other applications of chi-square use the mean or another know for the expected Now we need an Ho or null hypothesis The null hypothesis states “There is no difference between the expected (ratio) and the observed (ratio)” A X2 analysis will help determine if the difference between what you observed and what you expected is statistically significant or not. F ORMULA Determine the expected ratio and the expected numbers for each group. Collect the number of observed in each group. Calculate the chi-square statistic using this formula. Use the number of individuals and NOT proportions, ratios, or frequencies. (obs exp) exp 2 2 So what does it mean??? O = observed data E = expected data Σ = sum of……. The equation is used for each group in the experiment, and the values are added together Example F2 offspring : 290 purple 110 white total of 400 (290 + 110) offspring. We expect a 3: 1 ratio. Calculate the expected numbers Multiplying the total offspring by the expected proportions This we expect 400 x 3/4 = 300 purple, and 400 x 1/4 = 100 white. purple: obs = 290 and exp = 300 white: obs = 110 and exp = 100. Now it's just a matter of plugging into the formula: 2 = (290 - 300)2 / 300 + (110 - 100)2 / 100 = (-10)2 / 300 + (10)2 / 100 = 100 / 300 + 100 / 100 = 0.333 + 1.000 = 1.333. This is our chi-square value: now we need to see what it means and how to use it. WHAT DO WE COMPARE OUR COMPUTED CHI-SQUARE TO? Difference between the observed results and the expected results is small enough that it would be seen at least 1 time in 20 over thousands of experiments, we “fail to reject” the null hypothesis. For technical reasons, we use “fail to reject” instead of “accept”. “1 time in 20” can be written as a probability value p = 0.05, because 1/20 = 0.05. Another way of putting this is that only 5 % of the time this data could be collected by chance Degrees Of Freedom Use “degrees of freedom” Number of independent random variables involved. Degrees of freedom is simply the number of classes of offspring minus 1. For our example, there are 2 classes of offspring: purple and white. Degrees of freedom (df) = 2 -1 = 1. Critical Value Find Critical values for chi-square on tables use p = 0.05 and correct df If your calculated chi-square value is greater than the critical value from the table, you “reject the null hypothesis”. If your chi-square value is less than the critical value, you “fail to reject” the null hypothesis (that is, you accept that your genetic theory about the expected ratio is correct). Chi-Square Table USING THE TABLE In our example of 290 purple to 110 white, we calculated a chi-square value of 1.333, with 1 degree of freedom. Looking at the table, 1 d.f. is the first row, and p = 0.05 is the sixth column. Here we find the critical chi-square value, 3.841. Since our calculated chi-square, 1.333, is less than the critical value, 3.841, we “fail to reject” the null hypothesis. Thus, an observed ratio of 290 purple to 110 white is a good fit to a 3/4 to 1/4 ratio. ANOTHER EXAMPLE: FROM MENDEL phenotype observed 315 expected proportion 9/16 expected number 312 round yellow round green wrinkled yellow wrinkled green total 101 3/16 104 108 3/16 104 32 1/16 34 556 1 556 Find the Expected Numbers You are given the observed numbers, and you determine the expected proportions from a Punnett square. To get the expected numbers of offspring, first add up the observed offspring to get the total number of offspring. In this case, 315 + 101 + 108 + 32 = 556. Then multiply total offspring by the expected proportion: --expected round yellow = 9/16 x 556 = 312 --expected round green = 3/16 x 556 = 104 --expected wrinkled yellow = 3/16 x 556 = 104 --expected wrinkled green = 1/16 x 556 = 34 These add up to 556, the observed total offspring. CALCULATING THE CHI-SQUARE VALUE Use the formula. X2 = (315 - 312.75)2 / 312.75 + (101 - 104.25)2 / 104.25 + (108 - 104.25)2 / 104.25 + (32 - 34.75)2 / 34.75 = 0.016 + 0.101 + 0.135 + 0.218 = 0.470. (obs exp) exp 2 2 df = 3 Critical value = 7.815 X2 = 0.470 X2 < 7.81 so we accept our null hypothesis There is no statistical difference between our expected and our observed so our hypohteseis that we used to form our Punnett square is correct. Compare your computed chi-square to the critical value Critical values for chi-square are found on tables, sorted by degrees of freedom and probability levels. Use p = 0.05. If your calculated chi-square value is greater than the critical value from the table, you “reject the null hypothesis”. If your chi-square value is less than the critical value, you “fail to reject” the null hypothesis (that is, you accept that your genetic theory about the expected ratio is correct). For any lab scenario you should be able to Identify the IV and DV and know the appropriate units Describe the experimental treatment Identify the control or controls Know that replicas should exist for each treatment and that subjects should be randomly chosen Identify what the constants are Be able to form a hypothesis or identify the hypothesis Draw a conclusion from a data set IV and DV You are investigating how the crustacean Daphnia responds to changes in temperature. You expose Daphnia to temperatures of 5◦C, 10◦ C, 15◦ C, 20◦ C, and 30◦ C. You count the number of heartbeats/sec in each case. Temperature is the independent variable(you are manipulating it) Number of heartbeats/sec is the dependent variable(you observe how it changes in response to different temperatures). Use only one independent variable. Only one independent variable can be tested at a time. If you manipulate two independent variables at the same time, you cannot determine which is responsible for the effect you measure in the dependent variable. In the physiological experiment, if the subject also drinks coffee in addition to exercising, you cannot determine which treatment, coffee or exercise, causes a change in blood pressure. IV and DV You design an experiment to investigate the effect of exercise on pulse rate and blood pressure. The physiological conditions (independent variable, or variable you manipulate) include sitting, exercising, and recovery at various intervals following exercise. You make two kinds of measurements (two dependent variables) to evaluate the effect of the physiological conditions pulse rate and blood pressure are the dependent variables Time is the Independent variable. Identify a control treatment. The control treatment, or control, is the independent variable at some normal or standard value. The results of the control are used for comparison with the results of the experimental treatments. Identify a control treatment. In the Daphnia experiment, you choose the temperature of 20◦C as the control because that is the average temperature of the pond where you obtained the culture. In the experiment on physiological conditions, the control is sitting, when the subject is not influenced by exercising. Describe the experimental treatment. The experimental treatment (or treatments) are the various values that you assign to the independent variable. The experimental treatments describe how you are manipulating the independent variable. In the Daphnia experiment, the experimental treatments were the temperatures of 5◦C, 10◦C, 15◦C, 20◦C, and 30◦C. Random sample of subjects. You must choose the subjects for your experiments randomly. Since you cannot evaluate every Daphnia, you must choose a subpopulation to study. If you choose only the largest Daphnia to study, it is not a random sample, and you introduce another variable (size) for which you cannot account. Identify the constants The characteristics that remain the same are constants. Example, the number of donuts in a dozen are always 12. Describe the procedure. Describe how you will set up the experiment. Identify equipment and chemicals to be used and why you are choosing to use them. If appropriate, provide a labeled drawing of the setup. SEEDS GIVEN VARIOUS TREATMENTS ARE PLANTED IN SMALL POTS. THE GRAPH BELOW IS AN ILLUSTRATION OF THE DATA OBTAINED. At day 8 of the experiment, all of the following statements are correct EXCEPT: A. The T, DG, and DGA seedlings are very similar in height. B. The eventual greater height of the T seedlings over the DG seedlings is already predictable. C. The D seedlings are less than half as tall as the other seedlings. D. The DG seedlings are taller than the T seedlings. D = Dwarf pea plant seeds – no treatment DG = D.P.P.S. soaked in gibberellins DGA = D.P.P.S. soaked in gibberellins and auxin T = Tall, nondwarf pea plant seeds – no treatment IN COUNTRY 1, APPROXIMATELY WHAT PERCENTAGE OF THE INDIVIDUALS WERE YOUNGER THAN FIFTEEN YEARS OF AGE? A. B. C. D. 10% 21% 42% 52 % WHICH OF THE FOLLOWING BEST APPROXIMATES THE RATIO OF MALES TO FEMALES AMONG INDIVIDUALS BELOW FIFTEEN YEARS OF AGE? A. B. C. D. Country 1 1:1 0.75 : 1 0.5 : 1 1:1 Country 2 1:1 0.75 : 1 0.5 : 1 0.5 : 1 IF, IN COUNTRY 1, INFANT MORTALITY DECLINED AND THE BIRTH RATE REMAINED THE SAME, THEN INITIALLY THE POPULATION WOULD BE EXPECTED TO A. B. C. D. be more evenly distributed among the age classes be even more concentrated in the young age classes stabilize at the illustrated level for all age classes increase in the oldest age classes OVER THE NEXT 10-15 YEARS, THE STABILIZATION OF COUNTRY 1’S POPULATION AT ITS CURRENT SIZE WOULD REQUIRE THAT A. B. C. D. infant mortality be reduced to about half the present level the death rate be reduced drastically each couple produce fewer children than the number required to replace themselves about 15 years be added to the life expectancy of each person A wild-type fruit fly (heterozygous for gray body color and normal wings was mated with a black fly with vestigial wings. The offspring had the following phenotypic distribution: wild type, 778; black-vestigial, 785; blacknormal, 158; gray-vestigial, 162. What is the recombination frequency between these genes for body color and wing type. First count the total number of offspring 778+785+158+162 = 1883 In all dihybrid test crosses (a cross between a known heterozygote for two wild type traits and a homozygous recessive individual for both traits) the expected ratio of phenotypes if the genes are on separate chromosomes must be: wild type, 25%; black-vestigial, 25% black-normal, 25%; gray-vestigial, 25%. These results do not fit the experimental data above (778+785+158+162). In fact the black-normal (158) and gray-vestigial (162) offspring represent recombinant individuals. Calculation of recombination frequency: Recombination frequency = 17%