Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integrating grouped and ungrouped data: the point process case Irma Hernández Magallanes Institute of Applied Mathematics and Computer Science (IAMCS) Texas A&M University Grouped data is a topic that goes back to the end of the nineteenth century at least. Kulldorff (1968) refers to grouping as a special case of a more general kind of procedure, called partial grouping. A partially grouped sample refers to the case where available information is associated with a collection of disjoint sets partitioning a domain. The sample space is divided into non-overlapping sets. In some of these sets only the counts of observations are recorded (grouped data) while the individual values of the observations falling in the other sets are recorded (ungrouped data). This work is motivated by an interest in modeling the probability of wildfire ignition in 1990 in the Continental United States. The data cover fires that occurred in federal and nonfederal lands. The federal data consisted of each fire's point location (latitude and longitude) while the non-federal fires were aggregated by county. Wildfires occurrences can be considered as a spatial point process. For example, Brillinger, Preisler and Benoit (2003) approximate a point process by a binary process. We propose integrating the two levels of aggregate data, points and counts, by modeling the fires as a binary-valued process on space. The sample space is partitioned into small pixels arranged in a regular two-dimensional grid. Each pixel either has a fire or not. Under the assumption that the wildfire rate is a smooth varying function of space we propose a spatial smoothing method for partially grouped data. This smoother is based on locally weighted likelihood analysis using a binary-valued process to approximate the partial grouped data. Based on the binary-valued approximation a logit model is used with the National Fire Danger Rating System fuel model as explanatory variables. The estimated probabilities are included in a map with the associated uncertainty levels. Coupling Multiple Hypothesis Testing with Proportion Estimation in Heterogeneous Categorical Sensor Signal Networks Christopher Calderon Numerica Corporation This is joint work with A. Jones, S. Lundberg, and R. Paffenroth False alarms generated by sensors pose a substantial problem to a variety of fusion applications. We focus on situations where the frequency of a genuine alarm is “rare" but the false alarm rate is relatively high. The work is motivated by chemical and biological threat detection applications. The goal is to mitigate the false alarms while retaining high power to detect true events (missing a true signal is considered much more detrimental than declaring a false alarm in applications of interest). Furthermore we would like to “fuse information” by utilizing a multiple testing framework. Problems facing our application include: 1) the frequency of a genuine rare attack is not easy to quantify; 2) the misclassification rates are often unknown (or are not accurately described by nominal false alarm rates); 3) the statistical properties differ substantially from sensor to sensor. We propose to utilize data streams contaminated by false alarms (generated in the field) to compute statistics on sensor misclassification rates. The nominal misclassification rate of a deployed sensor is often suspect because it is unlikely that these estimates were tuned to the specific environmental conditions in which the sensor was deployed (i.e. sensor performance can have nontrivial spatial and temporal effects). Recent categorical measurement error methods will be applied to the collection of data streams to “train” the sensors and provide point estimates along with confidence intervals for the misclassification and estimated prevalence. Open questions still remain as to how to best combine these estimated signals to make a decision about the presence of a chemical or biological threat. There are also questions on how to efficiently assess/detect statistically changes in population parameters. Directions explored to date include false discovery rate methods aiming to roughly incorporate correlation effects into the computed false discovery rate statistics via ``empirical nulls”. We have also started preliminary work investigating resampling based approaches applied to “dimension reduced” sensor output with the hope that a more precise estimate of the correlation between the reduced dimensions can be empirically obtained and utilized in testing/decision making. MIXED- EFFECTS MODELS FOR MODELING CARDIAC FUNCTIONS AND TESTING TREATMENT EFFECTS Maiying Kong and Hyejeong Jang Department of Bioinformatics and Biostatistics, SPHIS, University of Louisville, Louisville, KY, 40202 Mixed- effects model is an efficient tool for analyzing longitudinal data. The random effects in mixed models can be used to capture the correlations between repeated measurements within a subject. The time points are not fixed and all available data can be used in mixed-effects model, provided data are missing at random. For this reason, we focus on applying mixed-effects models to the repeated measurements of different aspects of cardiac functions, such as heart rate, the left ventricle developed blood pressure, and coronary blood flow, in the Gluatathione-Stransferase (GSTP) gene knockout and wild-type mice experienced iscchemia/reperfusion injury. Each aspect of the cardiac function consists of measurements from three time periods: preischemic, ischemic, and reperfusion periods. We develop piecewise nonlinear function to describe the different aspects of the cardiac function. We apply nonlinear mixed effects models and changing point model to examine the cardiac functions experienced iscchemia/reperfusion injury and to compare group differences. Fast and Accurate Inference for the Smoothing Parameter in Semiparametric Models Alex Trindade Mathematics and Statistics Department Texas Tech University A fast and accurate method of confidence interval construction for the smoothing parameter in penalized spline and partially linear models is proposed. The method is akin to a parametric percentile bootstrap where Monte-Carlo simulation is replaced by saddlepoint approximation, and can therefore be viewed as an approximate bootstrap. It is applicable in a quite general setting, requiring only that the underlying estimator be the root of an estimating equation that is a quadratic form in normal random variables. This is the case under a variety of optimality criteria such as those commonly denoted by ML, REML, GCV, and AIC. Simulations studies reveal that under the ML and REML criteria, the method delivers a near-exact performance with computational speeds that are an order of magnitude faster than existing exact methods, and two orders of magnitude faster than a classical bootstrap. Perhaps most importantly, the proposed method also offers a computationally feasible alternative when no known exact or asymptotic methods exist, e.g. GCV and AIC. An application is illustrated by applying the methodology to the well-known fossil data. Giving a range of plausible smooths in this instance can help answer questions about the statistical significance of apparent features in the data. Confidence Limits for Lognormal Percentiles and for Lognormal Mean based on Samples with Multiple Detection Limits K. Krishnamoorthy Department of Mathematics, University of Louisiana at Lafayette Lafayette, LA 70508-1010, USA The problem of assessing occupational exposure using the mean or an upper percentile of a lognormal distribution is addressed. Inferential methods for constructing an upper confidence limit for an upper percentile of a lognormal distribution and for finding confidence intervals for a lognormal mean based on samples with multiple detection limits are proposed. The proposed methods are based on the maximum likelihood estimates. They perform well with respect to coverage probabilities as well as power, and are satisfactory for small samples. The proposed approaches are also applicable for finding confidence limits for the percentiles of a gamma distribution. An advantage of the proposed approach is the ease of computation and implementation. An illustrative example with real data sets is given. Finitely Inflated Poisson Distribution Santanu Chakraborty University of Texas - Pan American Zero Inflated Poisson and Zero Inflated Negative Binomial distributions are well known in the literature. They are used to model count data sets which have more zeros than usual Poisson or Negative Binomial datasets. So, the corresponding probability distributions have inflated masses at zero and deflated masses at nonzero points. But it should be appreciated that there are count data sets in the literature also with not only more zeros than a usual Poisson or Negative Binomial, but maybe more 1s, 2s and 3s. In that case, it makes more sense to inflate the original Poisson or Negative Binomial at 0, 1, 2 and 3. From this consideration, we introduce Finitely Inflated Poisson and Finitely Inflated Negative Binomial distributions into the literature. In this talk, we discuss Finitely Inflated Poisson (FIP) distribution. For example, an FIP with inflations at k points is defined as follows: k P(X x) i I 0 i {x i} k (1 i 0 i )e x /x! for x = 0, 1, 2, … where i is the inflator at i for i = 0, 1, 2, _ _ _ , k. We talk about moments, moment generating function, convolutions of independent identically distributed FIP distributions and some parametric and Bayesian inferential issues for this distribution in this talk.