Download here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Community fingerprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular ecology wikipedia , lookup

Transcript
Thematic Research Area NOREVENT
Faculty of Medicine, University of Oslo, http://www.med.uio.no/imb/stat/norevent/
Norevent’s 3rd Lysebu meeting
Monday 8th September 2003
The meeting is sponsored the University of Oslo, Faculty of Medicine and the
Norwegian Cancer Society.
The purpose of the meeting is to collect people interested in statistical
problems in genetics, including the new genomics. The theme is:
"Statistical challenges from genetics"
Program:
08.30-09.00: Registration
Chairman: Odd Aalen
09.00-09.25:
Arnoldo Frigessi: Bayesian inference for cDNA
microarrays
09.25-09.50:
Marit Holden: Estimation of absolute mRNA
concentrations from cDNA microarrays
09.50-10.15:
Mette Langaas: Statistical analysis of DNA microarray
data: experimental design, linear mixed effects models and
multiple testing
10.15-10.40:
Coffee / tea
Chairman: Ørnulf Borgan
10.40-11.25:
Peter Donnelly: Statistical inference in molecular
population genetics
11.25-12.10:
Juni Palmgren: Statistics and mapping of genes for
complex traits
12.10-13.10:
Lunch
13.10-13.35:
Ole Christian Lingjærde: Handling many covariates in
proportional hazard regression applied to microarray
survival data
Chairman: Arnoldo Frigessi
13.35-14.00:
Hege Edvardsen: Challenges of genetic diversity and the
use of high throughput genotyping in genetic epidemiology
14.00-14.25:
Thore Egeland: Statistical methods in forensic genetics:
New challenges
14.25-14.50:
Coffee / Tea
14.50-15.15:
Rolv Terje Lie: Estimation of genetic effects and geneenvironment interaction from case-parent triad data
15.15-15.40:
Håkon Gjessing: Log-linear models for case-parent triad
data with multiple alleles and haplotype information
15.40-16.00:
Summing up. Discussion.
Organizers: Odd Aalen, Ørnulf Borgan, Harald Fekjær, Arnoldo Frigessi, Tron
Anders Moger, Ida Scheel
Abstracts
Bayesian inference for cDNA microarrays
Arnoldo Frigessi
Section of Medical Statistics
University of Oslo
After introducing cDNA microarrays, I will make a list of the typical scientific questions
these type of data wish to investigate. Among the statistical methodologies, which are applied
to micorarray data I will describe the ANOVA approach, focusing on advantages and
difficulties. I will then move to the Bayesian approach, and report on currently used models.
Finally I will discuss the advantages of estimating absolute concentrations. Joint work with
Ingrid Glad, Marit Holden, Heidi Lyng, Ida Scheel and Mark van de Wiel.
Estimation of absolute mRNA concentrations from cDNA
microarrays
Marit Holden
Norwegian Computing Center
The aim of a microarray experiment is to measure the gene expression levels of cDNA
samples. We describe a Bayesian model based on the idea to interpret the microarray
experiment as a selection procedure, where the mRNA molecules initially present in a tissue
solution are stepwise selected, until eventually the luminosities of the hybridized ones are
measured. Starting from the unknown number of mRNA molecules for each gene, each step
of the microarray protocol is represented as a random selection, where each molecule has a
certain probability of being kept in the experiment. This probability is different for each step
and is modulated by target and probe related covariates and experimental settings. We follow
the mRNA molecules from transcription to hybridization and imaging. Given the luminosity
of the final remaining molecules per spot, we estimate backwards all parameters related to the
various covariates and come closer to an estimate of the mRNA counts per gene. Posterior
estimates for the parameters of the model, including the mRNA counts, are obtained via
MCMC.
Statistical analysis of DNA microarray data: experimental design,
linear mixed effects models and multiple testing
Mette Langaas
Department of Mathematical Sciences
Norwegian University of Science and Technology
In functional genomics the goal is to understand the function of the genes. This can indirectly
be done by monitoring the expression of genes in different tissue samples using the DNA
microarray technology. Today a typical DNA microarray data set consists of in the order of
10-100 samples and 15000-30000 genes.
With the aim of identifying differentially expressed genes between tissue samples, we can
define a multiple hypothesis testing situation. We can use statistical design of experiments to
select a set of microarray experiments to be performed in the laboratory. Using linear mixed
effects models (or other models and methods) parameter estimates and p-values for the
defined hypotheses can be computed. Genes with small p-values are declared to be
differentially expressed. In addition these p-values can be used in estimating the proportion of
true null hypotheses, thus the number of genes not differentially expressed between the
different tissues. Data from cDNA microarray experiments at NTNU will be used in the
presentation.
Statistical inference in molecular population genetics
Peter Donnelly
Department of Statistics
University of Oxford
There has been an explosion in data documenting genetic variation, at the level of DNA
sequences, between individuals of the same species. The patterns in such data potentially shed
light on the underlying evolutionary processes, on the demographic history of the populations,
and if combined with data on disease status, on the genetic basis of diseases. But the data has
a complicated correlation structure, and statistical inference is not straightforward.
Reasonably sophisticated stochastic models for the genetic evolution of the population have
been developed, but inference within these models is challenging. We illustrate the ideas in
the context of one current application of interest, namely estimating recombination rates over
short scales, and detecting recombination hot-spots, in humans.
Statistics and mapping of genes for complex traits
Juni Palmgren
Department of Medical Epidemiology and Biostatistics
Karolinska Institutet
Statistical mapping techniques for identifying disease susceptibility genes have been around
and developing for close to 50 years. The genetic basis for hundreds of simple Mendelian
traits is now well understood, and the challenge has turned to the etiology of complex
diseases, which involve actions and interactions of multiple genes and lifestyle and
demographic factors. I will discuss extensions of traditional methods of linkage and linkage
disequilibrium mapping, emphasizing that ‘modern’ statistical tools such as GEE, Bayesian
MCMC and complex sampling schemes play a role in ‘modern’ human molecular genetics.
Handling many covariates in proportional hazard regression applied
to microarray survival data
Ole Christian Lindgjærde
Institute of Informatics
University of Oslo
The talk will focus on the problem of linking gene expression profiles to censored survival
data such as patients' overall survival or time to relapse after treatment. Typically in this type
of problem, the number of covariates (genes) greatly outnumbers the number of cases. A
brief survey of existing approaches to deal with this problem in the context of proportional
hazard regression is considered. I then move on to describe some further work along these
lines currently being conducted by Knut Liestøl and myself.
Challenges of genetic diversity and the use of high throughput
genotyping in genetic epidemiology
Hege Edvardsen
Institute for Cancer Research
The Norwegian Radium Hospital
Molecular epidemiology faces the task of explaining complex phenotypes resulting from the
interaction of complex genotypes with numerous environmental factors. On of the greatest
challenges in this matter is the vast complexity of genetic variation. For instance, there are
perhaps as many as 10 million single nucleotide polymorphisms (SNPs) distributed across the
genome, occurring at a frequency of approximately 1-3 SNP/kb. Though this represents less
than 0.1% of the total DNA sequence it accounts for the myriad of factors that contribute to
the uniqueness of any single individual. A more detailed understanding of the function of the
human genome will be achieved as we identify sequence variations that influence gene
expression or protein-protein interactions. One of the promises of the genomic revolution is
that genetic variation can be applied to the study of disease susceptibility and to understand
inter individual differences in drug response. The abundance of SNPs and the ease with which
they can be measured make these genetic variations significant and they promise to advance
our ability to understand and treat human diseases, such as cancer. Our main problem today is
that we are ill equipped to handle the information streaming out of high-throughput
laboratories and currently we can generate more information than we can analyze utilizing the
classical tools of analysis, biostatistics and epidemiological analyses.
Statistical methods in forensic genetics: New challenges
Thore Egeland
Rikshospitalet and Section of Medical Statistics
University of Oslo
Traditionally, the main focus of statistical methods of forensic genetics has been on the
assessment of evidential strength. A classical example is the paternity index, which
statisticians recognize as a likelihood ratio. There has been less attention on crime
investigation. In this talk I pose a question in this framework: Can we indicate where a person
may come from based on biological evidence? The answer to this question, or a more precise
version of it, is of obvious interest and some relevant papers have published.
A wide range of statistical methods, including classification procedures, appears to be useful.
The talk is based on joint work with Antonia Salas, Spain.
Estimation of genetic effects and gene-environment interaction
from case triad data
Rolv Terje Lie
Section of Medical Statistics
University of Bergen
A review of methods for estimation of allelic effects and gene-environment interaction for diallelic markers from case triad data is provided. The alleles are referred to as the wild type
and the variant allele. Estimation is based on log-linear and logistic regression models. The
case triads are categorized into 15 different triad types by the number of variant alleles in each
of the case triad members, the mother, the father and the child. By fitting a multinomial model
to the counts of the 15 categories, parameters (relative risks) of a variety of genetic models
may be estimated. The models may include effects of the alleles of the child, direct effects of
alleles of the mother and imprinting effects.
By further stratification of the case triads by environmental exposures, the models may be
extended to estimate differences in allelic effects between exposed and non-exposed cases.
The information about interactions is, however, limited because the main effect of exposure
cannot be estimated from case-triads.
The information contained in the case triads is surprisingly rich and allow more detailed
analyses that the hypothesis tests (e.g. the TDT-test) commonly used to analyze such data.
When the genetic information is more complex, for example with more than two alleles per
loci, the number of triad types may become intractable.
Log-linear models for case-parent triad data with multiple alleles
and haplotype information
Håkon Gjessing
Norwegian Institute of Public Health
Testing for association between disease outcome and candidate genes is frequently done with
the transmission disequilibrium test (TDT). The test utilizes parental genotypes of a case child
to test whether specific alleles are transmitted more frequently than others, thus providing
control for population stratification. A shortcoming of the TDT is that no estimates of risk
associated with specific alleles
are provided.
One possible alternative to the TDT for a di-allelic locus is a log-linear model (Wilcox,
Weinberg, Lie, 1998), (Weinberg, Wilcox, Lie, 1998). This model provides estimates for the
effect of both the child’s and the mother’s variant alleles. Direct effect of maternal alleles is
particularly relevant in perinatal epidemiology.
We show how the log-linear model extends to situations with multiple alleles in a single
locus. The extension requires careful consideration of the parameterization. Furthermore, we
look at the use of this model to study the relative risk associated with single nucleotide
polymorphism (SNP) haplotypes also when phase is unknown. Finally we discuss some
technical issues of implementation, in particular a near-explicit solution, which reduces the
computational burden.
Participants at the 3rd Lysebu-meeting of Norevent
Name
E-mail
Institution
Department
Odd Aalen
Trygve Almøy
Steinar Bjerve
Ørnulf Borgan
Christian Brinch
Hege Bøvelstad
Birgitte Deblasio
Peter Donnelly
Hege Edvardsen
Thore Egeland
Anders Engeland
Harald Fekjær
Guri Feten
Egil Ferkingstad
Turid Follestad
Johan Fosen
Arnoldo Frigessi
Haakon K. Gjessing
Jon Ketil Grønnesby
Nina Gunnes
Tor Haldorsen
Ivar Heuch
Marit Holden
Ole Klungsøyr
Arne Kolstad
Anja Bråthen Kristoffersen
Mette Langaas
Rolv Terje Lie
Ole Christian Lingjærde
Per Magnus
Tron Anders Moger
Bjørn Møller
Jan Frank Nygård
Ståle Nygård
Juni Palmgren
Pia Rendedal
Anne Sagsveen
Sven Ove Samuelsen
Berit Sandstad
Ida Scheel
Arnfinn Schjalm
Randi Selmer
Joe Sexton
Hans Julius Skaug
Lars Snipen
Hein Stigum
Hege Leite Størvold
Solve Sæbø
Ragnhild Sørum
Magne Thoresen
Ingunn Fride Tvete
Marit Veierød
Torbjørn Fosen Wisløff
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Turid.Follestad @ math.ntnu.no
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
University of Oslo
Agricultural University of Norway
University of Oslo
University of Oslo
University of Oslo
University of Oslo
University of Oslo
University of Oxford
Norwegian Radium Hospital
National University Hospital
National Institute of Public Health
Institute of Population based Cancer Research
Agricultural University of Norway
University of Oslo
Norwegian University of Science and Technology
University of Oslo
University of Oslo
University of Oslo
Norwegian Public Service Pension Fund
University of Oslo
Institute of Population based Cancer Research
University of Bergen
Norwegian Computing Center
University of Oslo
National Office for Social Insurance
University of Oslo
Norwegian University of Science and Technology
University of Bergen
University of Oslo
National Institute of Public Health
University of Oslo
Institute of Population based Cancer Research
Institute of Population based Cancer Research
University of Oslo
Karolinska Institutet
National Office for Social Insurance
National Office for Social Insurance
University of Oslo
University of Oslo
University of Oslo
Statistics Norway
National Institute of Public Health
University of Oslo
Institute of Marine Research
Agricultural University of Norway
National Institute of Public Health
University of Oslo
Agricultural University of Norway
University of Oslo
University of Oslo
University of Oslo
University of Oslo
National Institute of Public Health
Section of Medical Statistics
Division of Statistics
Division of Statistics
Division of Statistics
Department of Physics
Department of Statistics
Institute for Cancer Research
Biostatistics
Division of Epidemiology
Section of Medical Statistics
Department of Mathematical Sciences
Section of Medical Statistics
Section of Medical Statistics
Section of Medical Statistics
Section of Medical Statistics
Department of Mathematics
Department of Medical Behavior
Department of Informatics
Department of Mathematical Sciences
Section of Medical Statistics
Department of Informatics
Divison of Epidemiology
Section of Medical Statistics
Section of Medical Statistics
Department of Medical Epidemiology and Biostatistics
Division of Statistics
Institute for Nutrition Research
Section of Medical Statistics
Division of Epidemiology
Institute for Nutrition Research
Division of Epidemiology
Section of Statistics
Division of Statistics
Section of Medical Statistics
Division of Statistics
Section of Medical Statistics
Division of Epidemiology