Download 9 Cases 98 Population

Statistical approaches for detecting unexplained clusters of disease.  Spatial Aggregation Thomas Talbot New York State Department of Health Environmental Health Surveillance Section Albany School of Public Health GIS & Public Health Class March 3, 2009 Cluster • A number of similar things grouped closely together Webster’s Dictionary • Unexplained concentrations of health events in space and/or time Public Health Definition • Occupation • Sex, Age • Socioeconomic class • Behavior (smoking) • Race • Time • Space Spatial Autocorrelation “Everything is related to everything else, but near things are more related than distant things.” - Tobler’s first law of geography Positive autocorrelation Negative autocorrelation Moran’s I • A test for spatial autocorrelation in disease rates. • Nearby areas tend to have similar rates of disease. Moran I is greater than 1, positive spatial autocorrelation. • When nearby areas are dissimilar Moran I is less than 1, negative spatial autocorrelation. Detecting Clusters • Consider scale • Consider zone • Control for multiple testing Talbot Cluster Questions • Does a disease cluster in space? • Does a disease cluster in both time and space? • Where is the most likely cluster? • Where is the most likely cluster in both time and space? More Cluster Questions • At what geographic or population scale do clusters appear? • Are cases of disease clustered in areas of high exposure? Nearest Neighbor Analysis Cuzick & Edwards Method • Count the the number of cases whose nearest neighbors are cases and not controls. • When cases are clustered the nearest neighbor to a case will tend to be another case, and the test statistic will be large. Nearest Neighbor Analyses Advantages • Accounts for the geographic variation in population density • Accounts for confounders through judicious selection of controls • Can detect clustering with many small clusters Disadvantages • Must have spatial locations of cases & controls • Doesn’t show location of the clusters Spatial Scan Statistic Martin Kulldorff •Determines the location with elevated rate that is statistically significant. •Adjust for multiple testing of the many possible locations and area sizes of clusters. •Uses Monte Carlo testing techniques The Space-Time Scan Statistic • Cylindrical window with a circular geographic base and a height corresponding to time. • Cylindrical window is moved in space and time. • P value for each cylinder calculated. Knox Method test for space-time interaction • When space-time interaction is present cases near in space will be near in time, the test statistic will be large. • Test statistic: The number of pairs of cases that are near in both time and space. Focal tests for clustering • Cross sectional or cohort approach: Is there a higher rate of disease in populations living in contaminated areas compared to populations in uncontaminated areas? (Relative risk) • Case/control approach: Are there more cases than controls living in a contaminated area? (Odds ratio) Focal Case-Control Design 500 m. 250 m. Case Control Regression Analysis • Control for know risk factors before analyzing for spatial clustering • Analyze for unexplained clusters. • Follow-up in areas with large regression residuals with traditional case-control or cohort studies • Obtain additional risk factor data to account for the large residuals. At what geographic or population scale do clusters appear? Multiresolution mapping. A cluster of cases in a neighborhood provides a different epidemiological meaning then a cluster of cases across several adjacent counties. Results can change dramatically with the scale of analysis. 1995-1999 Interactive Selections by rate, population and p value References • Talbot TO, Kulldorff M, Forand SP, and Haley VB. Evaluation of Spatial Filters to Create Smoothed Maps of Health Data. Statistics in Medicine. 2000, 19:2451-2467 • Forand SP, Talbot TO, Druschel C, Cross PK. Data Quality and the Spatial Analysis of Disease Rates: Congenital Malformations in New York. 2002. Health and Place. 2002, 8:191-199 • Haley VB, Talbot TO. Geographic Analysis of Blood Lead Levels in New York State Children Born 1994-1997. Environmental Health Perspectives 2004, 112(15):1577-1582 • Kuldorff M, National Cancer Institute. SatScan User Guide www.satscan.org Geographic Aggregation of Health Data by Thomas Talbot NYS Department of Health Environmental Health Surveillance Section Health data can be shown at different geographic scales • • • • • Residential address Census blocks, and tracts Towns Counties State Concerns about release of small area data • Risk of disclosure of confidential information. • Rates of disease are unreliable due to small numbers. Rate maps with small numbers provide very little information. http://www.nyhealth.gov/statistics/ny_asthma/hosp/zipcode/hamil_t2.htm http://www.nyhealth.gov/statistics/ny_asthma/hosp/zipcode/pdf/hamil_m2.pdf Disclosure of confidential information Census Blocks Smoothed or Aggregated Count & Rate Maps • Protect Confidentiality so data can be shared. • Reduce random fluctuations in rates due to small numbers. Smoothed Rate Maps • Borrow data from neighboring areas to provide more stable rates of disease. – Shareware tools available – Empirical or Hierarchal Bayesian approaches – Adaptive Spatial Filters – Head banging – etc. from Talbot et al., Statistics in Medicine, 2000 Problems with smoothing • Does not provide counts & rates for defined geographic areas. • Not clear how to link risk factor data with smoothed health data. • Methods are sometimes difficult to understand - “black boxes” • Does not meet requirements of some recent New York policies & legislation. Environmental Facilities & Cancer Incidence Map Law, 2008 § 3-0317 • Plot cancer cases by census block, except in cases where such plotting could make it possible to identify any cancer patient. • Census blocks shall be aggregated to protect confidentiality. Environmental Justice & Permitting NYSDEC Commissioner Policy 29 • Incorporate existing human health data into the environmental review process. • Data will be made available at a fine geographic scale (ZIP code or ZIP Code Groups). Aggregated Count or Rate Maps • Merge small areas with neighboring areas to provide more stable rates of disease and/or protect confidentiality. – Aggregation can be done manually. – Existing automated tools were difficult to use. Original ZIP Codes 3 Years Low Birth Weight Incidence Ratios Aggregated to 250 Births per ZIP Code Group Our Tool Requirements Goal • Aggregate small areas into larger ones. • User decides how much aggregation is needed. • Works with various levels of geography. – census blocks, tracts, towns, ZIP codes etc. – can nest one level of geography in another • Uses software which is readily available in NYSDOH (SAS) • Can output results for use in mapping programs. Aggregation Tool Regions Original Block Data † Block Block Cases Region 122300/2004 10 A 122300/2005 11 A 014500/3005 3 B Cases 122300/2004 10 122300/2005 11 014500/3005 3 014500/3007 4 014500/3007 4 B 014500/3008 0 014500/3008 0 B 014500/3009 1 014500/3009 1 B 014500/3010 15 014500/3010 15 B 103202/2001 8 103202/2001 8 C 103202/2002 6 103202/2002 6 C † Simulated data SAS Tool Cases Region 21 A 23 B 14 C Aggregation Process • Populated blocks with the fewest cases are merged first. • If there is a tie the program starts with the block with the fewest neighbors. • Selected block then is merged with the closest neighbor in the same census block group. • After merging the first block the list of neighbors is updated. • Process repeats until all regions have a minimum number of cases – program can also merge to user specified population Special Situations • Tool tries to avoid merging blocks in different census areas: – Census block groups – Census tracts (homogeneous population characteristics). – Counties • Tool tries to avoid merging blocks across major water bodies e.g. Finger lakes, Hudson River, Atlantic Ocean Water † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population † Simulated data 9 Cases 98 Population New York State Descriptive Statistics Year 2000 populated census blocks New Regions: Level of Aggregation Statistic (calculated using populated regions only) Original Census Blocks 6 cases 12 cases 24 cases 225,167 39,748 21,525 11,381 84 477 882 1,667 Median cases 1 10 20 38 Median Census blocks 1 4 7 14 Number Average Population NY number of cases NY population 470,000 18,976,457 Performance Measures • Compactness • Homogeneity with respect to demographic factors (measured as index of dissimilarity) • Similar population sizes. • Number of aggregated areas. • Aggregated zones are completely contained within larger areas. – e.g. blocks aggregation areas contained within tracts Index of dissimilarity the percentage of one group that would have to move to a different area in order to have a even distribution Wikipedia bi = the minority population of the ith area, e.g. census tract B = the total minority population of the large geographic entity for which the index of dissimilarity is being calculated. wi = the non-minority population of the ith area W = the total non-minority population of the large geographic entity for which the index of dissimilarity is being calculated. The End

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 9 Cases 98 Population