Download slides - Project MOSAIC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology wikipedia , lookup

Ronald Fisher wikipedia , lookup

History of biology wikipedia , lookup

Transcript
The Integration of …
Modeling, Statistics,
Computation and Calculus at
East Tennessee State University
Jeff Knisley — East Tennessee State University
Project Mosaic Kickoff Event – June 28, 2010
Integrative Projects at ETSU
 Focus on What we have been up to



The Symbiosis Project
General Education Statistics Course
Quantitative Modeling Track of the Math Major
 Later today / this week: Focus on How
we’ve done what we done
 (And just as valuable – What we’ve learned
from what hasn’t worked)
Symbiosis: An Introductory
Integrated Mathematics and Biology
Curriculum for the 21st Century
(HHMI 52005872)
 Team-taught by Biologists (6), Mathematicians
(3), and Statisticians (1)


Biologists progress to needs for analyses, models, or
related concepts (e.g., optimization)
A complete intro stats and calculus curriculum via the
needs and contexts provided by the biologists
 More Recently … extensive computational activities
featuring R, Maple, and Netlogo
Goals of the Symbiosis Project
 Implement a large subset of the recommendations of
the BIO2010 report in an introductory lab science
sequence


Semester 1: Statistics + Precalculus, Limits, Continuity
Semester 2: Calculus I course + Statistics
(Our focus on Semesters 1 and 2)

Semester 3: Modeling, BioInformatics, reinforcement of
previous ideas, More Statistics
Goals of the Symbiosis Project
 Use Biological contexts to motivate mathematical
and statistical concepts and tools


Analysis of data used to inform and interpret
Models and inference used to predict and explain
 Use Mathematical concepts and Statistical Inference
to produce biological insights


Insights often need to be quantified if only to predict the
scale on which the insight is valid
Especially useful are insights that cannot be obtained
without resorting to mathematics or statistics
Table of Contents
 Symbiosis I and II


List of “modules” with topics selected by biologists
Mathematical and Statistical Highlights included
(Not enough time to explore Symbiosis III)
 Logistics: 5 + 1 format, student populations
between 7 and 30, and 3 or 4 faculty per course
Symbiosis I
1.
2.
3.
4.
5.
6.
The Scientific Method: Numbers, models, binomial,
Randomization Test, Intro to Statistical Inference
The Cell: Descriptive Statistics and Correlation
Size and Scale: Lines, power laws, fractals, Poisson,
exponentials, logarithms, and linear regression
Mendelian Genetics: Chi-Square, Normal, Goodness of Fit
Test, Test of Independence
DNA: Conditional Probability, the Markov Property,
Sampling distributions
Proteins and Evolution: Limits, continuity, approximations,
and the t-test
Symbiosis II
Population Ecology: Derivatives, Rates of Change, Power,
Product, Quotient rules, Differential Equations
8. Species-Species Interactions: Chain rule, Properties of
the Derivative, Differential Equations Qualitatively,
Equilibria, Parameter Estimation
9. Behavioral Ecology: Optimization, curve-sketching,
L’hopital’s rule
10. Chronobiology: Trigonometric functions and their
derivatives, Periodograms
11. Integration and Plant Growth: Antiderivatives, Definite
Integrals, and the Fundamental Theorem
12. Energy and Enzymes: Applications of the Integral,
differential equations methods, Nonlinear Regression
7.
Major Outcomes
 Complete and/or Comprehensive Biological
Investigations


Traditional Bio Curriculum: Biological
questions pursued to a point short of
quantitative analysis
Symbiosis: Data and Models used to explore
biological questions and predict answers


Mendelian genetics via chi-square analysis of
data
rK strategists based on logistic model and
importance/stability of equilibria
Aspects of Integration
 Biologists need or can use almost all the math
and stats we can provide

But their goals are radically different



Statistical inference as a tool for justifying
classification of organisms into different categories
Models as a means of separating different
phenomena
And the results are used to address their (often
non-quantitative) questions

E.g.: Simple epidemiological models used to suggest
whether or not mosquito’s can carry the aids virus
Aspects of Integration
 Statisticians and Mathematicians can contribute
to biology in a variety of ways

But transparency is paramount


Examples of concepts/techniques “Transparent”
to our biologists: The Randomization test, p-values,
normal distribution, Chi-square, Periodograms,
logarithms, power laws, Nonlinear Regression,
phase-plane analysis
Examples of concepts/techniques that are NOT
“Transparent” to our biologists: the limit concept,
the exponential function, Poisson distribution,
conditional probability, t-test, degrees of freedom
Aspects of Integration
 Statisticians and Mathematicians can
contribute to biology in a variety of ways

And time/effort must be devoted to important
subtleties – within biological contexts

Example: Logarithms and exponentials with base e.
(Why not just use base 10 for everything?)

Example: Number of offspring, which is an
important bio-quantity – as Poisson-distributed

Example: The approximation (1+x)n ≈ enx occurs in
numerous applications and contexts in biology, but
it takes a long time before it “sinks in”
Observation
 Issues preventing “downstream” usage of math and stats
 Start as small issues at the most elementary levels
 Nearly all of module 1 addresses the difference between a
scientific hypothesis and a statistical hypothesis
 Surface area to volume ratio: First we must agree on
notation (i.e., A or S or SA or … ).
 And grow into major obstacles
 If insufficient time spent developing the hypotheses, result
may be “Doing the test” without really knowing what they
are testing.
 E.g.: If time is not spent exploring what a biologist means
by a population density, ecological models may become
impossible to interpret biologically.
Further Insights
 Computing and Computational Science have
emerged as major components


Informatics, genetics, proteomics, …
And Even in Ecology!
 Programming in R


Need is for math/stat informed algorithms
Not for elaborate structures or sophisticated
programming languages
Further Insights
 Logistics are a challenge
 Transcripts are important!!!
 Course sizes / delivery methods differ
significantly



Biology lectures can be huge
Biology labs are typically smaller than math/stat
sections
(I had never had to consider how to combine a
lab grade with a lecture grade)
 Communication is very important, especially
about the “little issues” that tend to grow
Future Directions for Symbiosis
 More emphasis on computation
 Algorithms as method to address biological inquiries
 Algorithms as statistical tools
 Inference via bootstrapping,
 Predictions via clustering
 Informatics
 Avoiding reliance on “off-the-shelf” approaches
 Symbiosis IV: A Gen Ed “Intro to Computational
Science” course for math and bio majors
General Education Statistics
 In 1996, ETSU began requiring every non-calculus
student take an introductory statistics course in their
first year


To enable students to understand and participate in a
data-driven world
To prepare students for the stats they would see in their
respective majors
 In 2001, the Gen Ed Stats course moved into the “Stat
Cave” – a 45 station computer lab


To make the course technology-driven and data-intensive
Approx 1200 students per semester (100 in summer)
continuously using Minitab, applets, etc.
math.etsu.edu/stats/
Some Features of the Course
 Teaching multiple sections
 Extensive training of instructors
 Highly structured course content
 Online/Off-campus sections may use
calculators for some activities
 Two part Final Exam
 A comprehensive data analysis project due
the week before the in-class Final Exam
 A standardized M/C final exam common to all
sections of the course
Quantitative Modeling Track in the
Math Major
 In conjunction with our Statistical Literacy and
Quantitative Biology emphases


Features many different modeling courses
 Statistical modeling
 Mathematical modeling
 Predictive modeling (data mining, machine learning)
 Survival models (with computational emphasis)
 Computational/Discrete Modeling
(students take 2 to 4 of these)
 Future: Integrate with other sciences, Public Health,
Medicine, Pharmacy, etcetera…
Thank you!
Any questions