Download Biostatistics www.AssignmentPoint.com Biostatistics (or biometry) is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene prediction wikipedia , lookup

Gene Disease Database wikipedia , lookup

Ronald Fisher wikipedia , lookup

Bioinformatics wikipedia , lookup

Transcript
Biostatistics
www.AssignmentPoint.com
www.AssignmentPoint.com
Biostatistics (or biometry) is the application of statistics to a wide range of
topics in biology. The science of biostatistics encompasses the design of
biological experiments, especially in medicine, pharmacy, agriculture and
fishery; the collection, summarization, and analysis of data from those
experiments; and the interpretation of, and inference from, the results. A major
branch of this is medical biostatistics, which is exclusively concerned with
medicine and health.
History
Biostatistical reasoning and modeling were of critical importance to the
foundation theories of modern biology. In the early 1900s, after the rediscovery
of Gregor Mendel's Mendelian inheritance work, the gaps in understanding
between genetics and evolutionary Darwinism led to vigorous debate among
biometricians, such as Walter Weldon and Karl Pearson, and Mendelians, such
as Charles Davenport, William Bateson and Wilhelm Johannsen. By the 1930s,
statisticians and models built on statistical reasoning had helped to resolve these
differences and to produce the neo-Darwinian modern evolutionary synthesis.
The leading figures in the establishment of population genetics and this
synthesis all relied on statistics and developed its use in biology.
 Ronald Fisher developed several basic statistical methods in support of
his work studying the field experiments at Rothamsted Research,
including in his 1930 book The Genetical Theory of Natural Selection
 Sewall G. Wright developed F-statistics and methods of computing them
www.AssignmentPoint.com
 J. B. S. Haldane's book, The Causes of Evolution, reestablished natural
selection as the premier mechanism of evolution by explaining it in terms
of the mathematical consequences of Mendelian genetics.
These individuals and the work of other biostatisticians, mathematical
biologists, and statistically inclined geneticists helped bring together
evolutionary biology and genetics into a consistent, coherent whole that could
begin to be quantitatively modeled.
In parallel to this overall development, the pioneering work of D'Arcy
Thompson in On Growth and Form also helped to add quantitative discipline to
biological study.
Despite the fundamental importance and frequent necessity of statistical
reasoning, there may nonetheless have been a tendency among biologists to
distrust or deprecate results which are not qualitatively apparent. One anecdote
describes Thomas Hunt Morgan banning the Friden calculator from his
department at Caltech, saying "Well, I am like a guy who is prospecting for gold
along the banks of the Sacramento River in 1849. With a little intelligence, I can
reach down and pick up big nuggets of gold. And as long as I can do that, I'm
not going to let any people in my department waste scarce resources in placer
mining."
Scope and training programs
www.AssignmentPoint.com
Almost all educational programmes in biostatistics are at postgraduate level.
They are most often found in schools of public health, affiliated with schools of
medicine, forestry, or agriculture, or as a focus of application in departments of
statistics.
In the United States, where several universities have dedicated biostatistics
departments, many other top-tier universities integrate biostatistics faculty into
statistics or other departments, such as epidemiology. Thus, departments
carrying the name "biostatistics" may exist under quite different structures. For
instance, relatively new biostatistics departments have been founded with a
focus on bioinformatics and computational biology, whereas older departments,
typically affiliated with schools of public health, will have more traditional lines
of research involving epidemiological studies and clinical trials as well as
bioinformatics. In larger universities where both a statistics and a biostatistics
department exist, the degree of integration between the two departments may
range from the bare minimum to very close collaboration. In general, the
difference between a statistics program and a biostatistics program is twofold:
(i) statistics departments will often host theoretical/methodological research
which are less common in biostatistics programs and (ii) statistics departments
have lines of research that may include biomedical applications but also other
areas such as industry (quality control), business and economics and biological
areas other than medicine.
Recent developments in modern biostatistics
www.AssignmentPoint.com
The advent of modern computer technology and relatively cheap computing
resources have enabled computer-intensive biostatistical methods like
bootstrapping and resampling methods. Furthermore, new biomedical
technologies like microarrays, next generation sequencers (for genomics) and
mass spectrometry (for proteomics) generate enormous amounts of (redundant)
data that can only be analyzed with biostatistical methods. For example, a
microarray can measure all the genes of the human genome simultaneously, but
only a fraction of them will be differentially expressed in diseased vs. nondiseased states. One might encounter the problem of multicolinearity: Due to
high intercorrelation between the predictors (in this case say genes), the
information of one predictor might be contained in another one. It could be that
only 5% of the predictors are responsible for 90% of the variability of the
response. In such a case, one would apply the biostatistical technique of
dimension reduction (for example via principal component analysis). Classical
statistical techniques like linear or logistic regression and linear discriminant
analysis do not work well for high dimensional data (i.e. when the number of
observations n is smaller than the number of features or predictors p: n < p). As
a matter of fact, one can get quite high R2-values despite very low predictive
power of the statistical model. These classical statistical techniques (esp. least
squares linear regression) were developed for low dimensional data (i.e. where
the number of observations n is much larger than the number of predictors p: n
>> p). In cases of high dimensionality, one should always consider an
independent validation test set and the corresponding residual sum of squares
(RSS) and R2 of the validation test set, not those of the training set.
In recent times, random forests have gained popularity. This technique, invented
by the statistician Leo Breiman, generates a lot of decision trees randomly and
uses them for classification (In classification the response is on a nominal or
www.AssignmentPoint.com
ordinal scale, as opposed to regression where the response is on a ratio scale).
Decision trees have of course the advantage that you can draw them and
interpret them (even with a very basic understanding of mathematics and
statistics). Random Forrests have thus been used for clinical decision support
systems.
Gene Set Enrichment Analysis (GSEA) is a new method for analyzing
biological high throughput experiments. With this method, one does not
consider the perturbation of single genes but of whole (functionally related)
gene sets. These gene sets might be known biochemical pathways or otherwise
functionally related genes. The advantage of this approach is that it is more
robust: It is more likely that a single gene is found to be falsely perturbed than it
is that a whole pathway is falsely perturbed. Furthermore, one can integrate the
accumulated knowledge about biochemical pathways (like the JAK-STAT
signaling pathway) using this approach.
www.AssignmentPoint.com