Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Thematic Research Area NOREVENT Faculty of Medicine, University of Oslo, http://www.med.uio.no/imb/stat/norevent/ Norevent’s 3rd Lysebu meeting Monday 8th September 2003 The meeting is sponsored the University of Oslo, Faculty of Medicine and the Norwegian Cancer Society. The purpose of the meeting is to collect people interested in statistical problems in genetics, including the new genomics. The theme is: "Statistical challenges from genetics" Program: 08.30-09.00: Registration Chairman: Odd Aalen 09.00-09.25: Arnoldo Frigessi: Bayesian inference for cDNA microarrays 09.25-09.50: Marit Holden: Estimation of absolute mRNA concentrations from cDNA microarrays 09.50-10.15: Mette Langaas: Statistical analysis of DNA microarray data: experimental design, linear mixed effects models and multiple testing 10.15-10.40: Coffee / tea Chairman: Ørnulf Borgan 10.40-11.25: Peter Donnelly: Statistical inference in molecular population genetics 11.25-12.10: Juni Palmgren: Statistics and mapping of genes for complex traits 12.10-13.10: Lunch 13.10-13.35: Ole Christian Lingjærde: Handling many covariates in proportional hazard regression applied to microarray survival data Chairman: Arnoldo Frigessi 13.35-14.00: Hege Edvardsen: Challenges of genetic diversity and the use of high throughput genotyping in genetic epidemiology 14.00-14.25: Thore Egeland: Statistical methods in forensic genetics: New challenges 14.25-14.50: Coffee / Tea 14.50-15.15: Rolv Terje Lie: Estimation of genetic effects and geneenvironment interaction from case-parent triad data 15.15-15.40: Håkon Gjessing: Log-linear models for case-parent triad data with multiple alleles and haplotype information 15.40-16.00: Summing up. Discussion. Organizers: Odd Aalen, Ørnulf Borgan, Harald Fekjær, Arnoldo Frigessi, Tron Anders Moger, Ida Scheel Abstracts Bayesian inference for cDNA microarrays Arnoldo Frigessi Section of Medical Statistics University of Oslo After introducing cDNA microarrays, I will make a list of the typical scientific questions these type of data wish to investigate. Among the statistical methodologies, which are applied to micorarray data I will describe the ANOVA approach, focusing on advantages and difficulties. I will then move to the Bayesian approach, and report on currently used models. Finally I will discuss the advantages of estimating absolute concentrations. Joint work with Ingrid Glad, Marit Holden, Heidi Lyng, Ida Scheel and Mark van de Wiel. Estimation of absolute mRNA concentrations from cDNA microarrays Marit Holden Norwegian Computing Center The aim of a microarray experiment is to measure the gene expression levels of cDNA samples. We describe a Bayesian model based on the idea to interpret the microarray experiment as a selection procedure, where the mRNA molecules initially present in a tissue solution are stepwise selected, until eventually the luminosities of the hybridized ones are measured. Starting from the unknown number of mRNA molecules for each gene, each step of the microarray protocol is represented as a random selection, where each molecule has a certain probability of being kept in the experiment. This probability is different for each step and is modulated by target and probe related covariates and experimental settings. We follow the mRNA molecules from transcription to hybridization and imaging. Given the luminosity of the final remaining molecules per spot, we estimate backwards all parameters related to the various covariates and come closer to an estimate of the mRNA counts per gene. Posterior estimates for the parameters of the model, including the mRNA counts, are obtained via MCMC. Statistical analysis of DNA microarray data: experimental design, linear mixed effects models and multiple testing Mette Langaas Department of Mathematical Sciences Norwegian University of Science and Technology In functional genomics the goal is to understand the function of the genes. This can indirectly be done by monitoring the expression of genes in different tissue samples using the DNA microarray technology. Today a typical DNA microarray data set consists of in the order of 10-100 samples and 15000-30000 genes. With the aim of identifying differentially expressed genes between tissue samples, we can define a multiple hypothesis testing situation. We can use statistical design of experiments to select a set of microarray experiments to be performed in the laboratory. Using linear mixed effects models (or other models and methods) parameter estimates and p-values for the defined hypotheses can be computed. Genes with small p-values are declared to be differentially expressed. In addition these p-values can be used in estimating the proportion of true null hypotheses, thus the number of genes not differentially expressed between the different tissues. Data from cDNA microarray experiments at NTNU will be used in the presentation. Statistical inference in molecular population genetics Peter Donnelly Department of Statistics University of Oxford There has been an explosion in data documenting genetic variation, at the level of DNA sequences, between individuals of the same species. The patterns in such data potentially shed light on the underlying evolutionary processes, on the demographic history of the populations, and if combined with data on disease status, on the genetic basis of diseases. But the data has a complicated correlation structure, and statistical inference is not straightforward. Reasonably sophisticated stochastic models for the genetic evolution of the population have been developed, but inference within these models is challenging. We illustrate the ideas in the context of one current application of interest, namely estimating recombination rates over short scales, and detecting recombination hot-spots, in humans. Statistics and mapping of genes for complex traits Juni Palmgren Department of Medical Epidemiology and Biostatistics Karolinska Institutet Statistical mapping techniques for identifying disease susceptibility genes have been around and developing for close to 50 years. The genetic basis for hundreds of simple Mendelian traits is now well understood, and the challenge has turned to the etiology of complex diseases, which involve actions and interactions of multiple genes and lifestyle and demographic factors. I will discuss extensions of traditional methods of linkage and linkage disequilibrium mapping, emphasizing that ‘modern’ statistical tools such as GEE, Bayesian MCMC and complex sampling schemes play a role in ‘modern’ human molecular genetics. Handling many covariates in proportional hazard regression applied to microarray survival data Ole Christian Lindgjærde Institute of Informatics University of Oslo The talk will focus on the problem of linking gene expression profiles to censored survival data such as patients' overall survival or time to relapse after treatment. Typically in this type of problem, the number of covariates (genes) greatly outnumbers the number of cases. A brief survey of existing approaches to deal with this problem in the context of proportional hazard regression is considered. I then move on to describe some further work along these lines currently being conducted by Knut Liestøl and myself. Challenges of genetic diversity and the use of high throughput genotyping in genetic epidemiology Hege Edvardsen Institute for Cancer Research The Norwegian Radium Hospital Molecular epidemiology faces the task of explaining complex phenotypes resulting from the interaction of complex genotypes with numerous environmental factors. On of the greatest challenges in this matter is the vast complexity of genetic variation. For instance, there are perhaps as many as 10 million single nucleotide polymorphisms (SNPs) distributed across the genome, occurring at a frequency of approximately 1-3 SNP/kb. Though this represents less than 0.1% of the total DNA sequence it accounts for the myriad of factors that contribute to the uniqueness of any single individual. A more detailed understanding of the function of the human genome will be achieved as we identify sequence variations that influence gene expression or protein-protein interactions. One of the promises of the genomic revolution is that genetic variation can be applied to the study of disease susceptibility and to understand inter individual differences in drug response. The abundance of SNPs and the ease with which they can be measured make these genetic variations significant and they promise to advance our ability to understand and treat human diseases, such as cancer. Our main problem today is that we are ill equipped to handle the information streaming out of high-throughput laboratories and currently we can generate more information than we can analyze utilizing the classical tools of analysis, biostatistics and epidemiological analyses. Statistical methods in forensic genetics: New challenges Thore Egeland Rikshospitalet and Section of Medical Statistics University of Oslo Traditionally, the main focus of statistical methods of forensic genetics has been on the assessment of evidential strength. A classical example is the paternity index, which statisticians recognize as a likelihood ratio. There has been less attention on crime investigation. In this talk I pose a question in this framework: Can we indicate where a person may come from based on biological evidence? The answer to this question, or a more precise version of it, is of obvious interest and some relevant papers have published. A wide range of statistical methods, including classification procedures, appears to be useful. The talk is based on joint work with Antonia Salas, Spain. Estimation of genetic effects and gene-environment interaction from case triad data Rolv Terje Lie Section of Medical Statistics University of Bergen A review of methods for estimation of allelic effects and gene-environment interaction for diallelic markers from case triad data is provided. The alleles are referred to as the wild type and the variant allele. Estimation is based on log-linear and logistic regression models. The case triads are categorized into 15 different triad types by the number of variant alleles in each of the case triad members, the mother, the father and the child. By fitting a multinomial model to the counts of the 15 categories, parameters (relative risks) of a variety of genetic models may be estimated. The models may include effects of the alleles of the child, direct effects of alleles of the mother and imprinting effects. By further stratification of the case triads by environmental exposures, the models may be extended to estimate differences in allelic effects between exposed and non-exposed cases. The information about interactions is, however, limited because the main effect of exposure cannot be estimated from case-triads. The information contained in the case triads is surprisingly rich and allow more detailed analyses that the hypothesis tests (e.g. the TDT-test) commonly used to analyze such data. When the genetic information is more complex, for example with more than two alleles per loci, the number of triad types may become intractable. Log-linear models for case-parent triad data with multiple alleles and haplotype information Håkon Gjessing Norwegian Institute of Public Health Testing for association between disease outcome and candidate genes is frequently done with the transmission disequilibrium test (TDT). The test utilizes parental genotypes of a case child to test whether specific alleles are transmitted more frequently than others, thus providing control for population stratification. A shortcoming of the TDT is that no estimates of risk associated with specific alleles are provided. One possible alternative to the TDT for a di-allelic locus is a log-linear model (Wilcox, Weinberg, Lie, 1998), (Weinberg, Wilcox, Lie, 1998). This model provides estimates for the effect of both the child’s and the mother’s variant alleles. Direct effect of maternal alleles is particularly relevant in perinatal epidemiology. We show how the log-linear model extends to situations with multiple alleles in a single locus. The extension requires careful consideration of the parameterization. Furthermore, we look at the use of this model to study the relative risk associated with single nucleotide polymorphism (SNP) haplotypes also when phase is unknown. Finally we discuss some technical issues of implementation, in particular a near-explicit solution, which reduces the computational burden. Participants at the 3rd Lysebu-meeting of Norevent Name E-mail Institution Department Odd Aalen Trygve Almøy Steinar Bjerve Ørnulf Borgan Christian Brinch Hege Bøvelstad Birgitte Deblasio Peter Donnelly Hege Edvardsen Thore Egeland Anders Engeland Harald Fekjær Guri Feten Egil Ferkingstad Turid Follestad Johan Fosen Arnoldo Frigessi Haakon K. Gjessing Jon Ketil Grønnesby Nina Gunnes Tor Haldorsen Ivar Heuch Marit Holden Ole Klungsøyr Arne Kolstad Anja Bråthen Kristoffersen Mette Langaas Rolv Terje Lie Ole Christian Lingjærde Per Magnus Tron Anders Moger Bjørn Møller Jan Frank Nygård Ståle Nygård Juni Palmgren Pia Rendedal Anne Sagsveen Sven Ove Samuelsen Berit Sandstad Ida Scheel Arnfinn Schjalm Randi Selmer Joe Sexton Hans Julius Skaug Lars Snipen Hein Stigum Hege Leite Størvold Solve Sæbø Ragnhild Sørum Magne Thoresen Ingunn Fride Tvete Marit Veierød Torbjørn Fosen Wisløff [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Turid.Follestad @ math.ntnu.no [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] University of Oslo Agricultural University of Norway University of Oslo University of Oslo University of Oslo University of Oslo University of Oslo University of Oxford Norwegian Radium Hospital National University Hospital National Institute of Public Health Institute of Population based Cancer Research Agricultural University of Norway University of Oslo Norwegian University of Science and Technology University of Oslo University of Oslo University of Oslo Norwegian Public Service Pension Fund University of Oslo Institute of Population based Cancer Research University of Bergen Norwegian Computing Center University of Oslo National Office for Social Insurance University of Oslo Norwegian University of Science and Technology University of Bergen University of Oslo National Institute of Public Health University of Oslo Institute of Population based Cancer Research Institute of Population based Cancer Research University of Oslo Karolinska Institutet National Office for Social Insurance National Office for Social Insurance University of Oslo University of Oslo University of Oslo Statistics Norway National Institute of Public Health University of Oslo Institute of Marine Research Agricultural University of Norway National Institute of Public Health University of Oslo Agricultural University of Norway University of Oslo University of Oslo University of Oslo University of Oslo National Institute of Public Health Section of Medical Statistics Division of Statistics Division of Statistics Division of Statistics Department of Physics Department of Statistics Institute for Cancer Research Biostatistics Division of Epidemiology Section of Medical Statistics Department of Mathematical Sciences Section of Medical Statistics Section of Medical Statistics Section of Medical Statistics Section of Medical Statistics Department of Mathematics Department of Medical Behavior Department of Informatics Department of Mathematical Sciences Section of Medical Statistics Department of Informatics Divison of Epidemiology Section of Medical Statistics Section of Medical Statistics Department of Medical Epidemiology and Biostatistics Division of Statistics Institute for Nutrition Research Section of Medical Statistics Division of Epidemiology Institute for Nutrition Research Division of Epidemiology Section of Statistics Division of Statistics Section of Medical Statistics Division of Statistics Section of Medical Statistics Division of Epidemiology