Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Best Practices in Statistical Data Analysis Valedictory Symposium for Willem J. Heiser Leiden, January 30st, 2014 Program 13.00 David Hand: What’s the problem: Answering the statistical question 13.30 Henk Kelderman: Improving latent variable models by using collateral variables – 14.30 Serge Rombouts: Best practices in Functional Magnetic Resonance Imaging of the Brain 15.00 Richard Gill: Worst practices in statistical data analysis – 16.00 Leland Wilkinson: Anomalies 16.30 Lawrence Hubert: Henry A. Wallace (1988-1965): Agricultural Statistician (Econometrician) Extraordinaire Abstracts What’s the problem: Answering the statistical question David Hand George Box once pointed out that statistical analysis involved two approximations: a big one, which is the approximation to the problem you want to solve, and a small one, leading to finding the solution to the approximate problem. I give four examples showing that we sometimes devote too little attention to the big approximation, so that we end up finding precise answers to questions which are irrelevant to our objectives. Improving latent variable models by using collateral variables Henk Kelderman There are several reasons why latent variables models are not always appropriate for analyzing test and questionnaire data. Since by definition not much is known a-priori about latent variables, one needs either strong assumptions or a lot of data to estimate latent variable models. For example for quality of life or clinical questionnaires latent variables may not be normally distributed. One usually must have a large sample to be able to estimate a latent variable model with less restrictive distributional assumptions. Another evil afflicts the assumption of independence of measurement errors. Administering a test can be seen a psychological experiment with sequential item-trials. When subjects answer a questionnaire item, they read and try to understand the question, retrieve information from memory, make a judgment, and give a response. It would be hard to convince a seasoned experimentalist that these processes do not influence those of the next item. Both problems can be tackled by adding many additional variables from the nomological network around the latent variable of interest to the model. In this paper we show how statistical learners could be employed to improve structural equation models with latent variable models. Best practices in Functional Magnetic Resonance Imaging of the Brain Serge Rombouts Functional Magnetic Resonance Imaging (FMRI) of the Brain is a technique to image brain activation. FMRI data in one individual consists of ˜100,000 brain ‘voxels’ (voxels are 3D pixels), each with a few hundred time points. Each voxel’s time course represents dynamic FMRI brain activity of a specific region in the brain. Usually one or more groups of subjects are studied. FMRI applications include studying brain function in normal controls, in psychiatric and neurologic patients, in brain development, aging, after pharmacologic manipulations, for pre-surgical planning and the association of genetic information with regional brain function. Two sorts of FMRI studies can be distinguished: ‘task-FMRI’ and ‘resting state FMRI connectivity’. In task-FMRI, brain activation is manipulated using a task. Analyses are aimed at finding task-related brain activation. In studies of resting state FMRI connectivity, spontaneous changes in brain activity are studied, without the application of an externally controlled task. Here, functional connectivity of spontaneous FMRI activity in different brain regions is studied using correlation and regression techniques. In each individual, preprocessing of FMRI data includes temporal and spatial filtering, and motion correction. Next, in each individual a general linear model is applied for statistical analysis. For task-FMRI, the statistical model has at least one regressor representing the expected taskrelated behavior. For resting state FMRI connectivity the regressors are the spontaneous FMRI signal in one or more regions of interest. Analysis results in 3D images containing betas representing the association of the voxels’ time courses with the temporal regressors of the model. For group analysis, individual data is registered to a standard brain and group statistics are applied on the individual 3D images that result from the individual data analysis. Statistical thresholding on group level requires correction for multiple testing in the different brain regions. I will discuss the various analysis steps in FMRI research and discuss best practices for a number of challenges that one encounters in these analyses. Worst practices in statistical data analysis Richard Gill After a long and bitter conflict between the authors, the paper Geraerts, McNally, Jelicic, Merckelbach & Raymaekers (2008) "Linking thought suppression and recovered memories of childhood sexual abuse" was finally retracted from the journal "Memory". Quite extraordinary errors seem to have been made in the preparation of the data for statistical analysis. How could this happen, and how could those errors have been prevented? And why did it take so long to put this right? As well as statistical issues, I will discuss the role of the media and the role of university administrators in the affair. I argue that when anomalies are discovered in scientific work, the right way to proceed is to discuss the anomalies openly in the scientific community. Possibly the worst way to proceed, and I will try to explain why, is to bring the conflict into the realm of judicial or disciplinary investigations by committees on scientific integrity. Let’s focus in the first place on the integrity of the results, not on the integrity of the persons. Anomalies Leland Wilkinson Statisticians usually consider anomalies to be identifiable through a process of looking for outliers. Anomalies, however, are more general than outliers. Even for the restrictive case of points in a real coordinate space, outliers can be considered anomalies but anomalies are not necessarily outliers. In this talk, I will explore this distinction and present strategies for recognizing anomalies in pointwise data. Some of these strategies are motivated by the kinds of methods favored by Willem Heiser and others at Leiden. Henry A. Wallace (1988-1965): Agricultural Statistician (Econometrician) Extraordinaire Lawrence Hubert This talk is about Henry A. Wallace who was (among other positions he held) the U.S. Secretary of Agriculture under Roosevelt and the New Deal (1933-1940), and Vice-President under Roosevelt (1941-1945). Wallace demonstrated the absolute best practices in (agricultural) statistics and econometrics, including his time as the New Deal Secretary of the USDA. He demanded an evidencebased set of agricultural policies and practices (through statistics and data gathering) that helped pull the U.S. out the the Great Depression. Wallace is arguably the single person responsible for the first Department of Statistics in the U.S.; this is because of Wallace’s connections with George Snedecor. Finally, Wallace was the lead architect in the development of numerical procedures (in the 1920s) for solving the normal equations in large-scale multiple regression.