Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to statistics and data Looking at numbers… Group exercise: What’s the math problem in each of the four examples I’ve given you? EXAMPLE 1. Experimental treatment Standard treatment Table 2. Outcome volume for the experimental and standard groups; mean (SD). Location Week 0 experimental Week 12 standard experimental standard Change (Week 0 – Week 12) experimental standard Affected side 3135 (748)* 3333 (1368)* 2982 (715)* 3331 (1383)* –154 (168) –2 (306) Contralateral side 2595 (672) 2654 (761) 2553 (606) 2631 (736) –42 (193) –23 (219) * p< .05 greater than the contralateral side EXAMPLE 2. Objective: The study objective is to determine the efficacy of a new treatment cream as a therapeutic option for eczema. Methods: Prospective study under institutional review board approval of ten patients with eczema, who were all treated with the experimental cream. Three blinded independent investigators evaluated overall improvement, as well as changes in scaliness and redness, graded on a quartile (0-3) scale: 0=none, 1=mild (1-33%), 2=moderate (34-66%), 3=excellent (67-100%). Results: All patients showed overall improvement as measured by blinded investigators. Of patients showing overall improvement, 78% were graded as having either excellent or moderate improvement. Ninety-six percent of subjects demonstrated improvements in scaliness and redness. Limitations: Small sample size EXAMPLE 3. Table 1 -- Baseline characteristics by height and follow-up for incident cancer in the Million Women Study Height in cm* <155 152·8 (4·1) All women 165 164·9 (2·9) 170 169·0 (2·9) ≥175 173·8 (4·3) 160·9 (6·4) ‡ 388 515 56·2 (4·9) 42 862 (22%) 72 763 (20%) 43 324 (22%) 65 622 (18%) 69 607 (37%) 139 607 (37%) 40 296 (10%) 82 436 (35%) 288 893 56·0 (4·8) 73 119 (19%) 51 678 (19%) 92 126 (24%) 42 004 (15%) 147 103 (39%) 108 550 (38%) 33 267 (12%) 67 118 (34%) 143 289 56·0 (4·8) 48 190 (17%) 26 147 (19%) 73 597 (26%) 18 370 (13%) 116 614 (42%) 57 852 (41%) 17 985 (13%) 127 826 (33%) 46 138 55·8 (4·8) 23 262 (16%) 8 369 (19%) 36 742 (26%) 5 320 (12%) 58 339 (42%) 20 176 (45%) 6 900 (15%) 91 287 (32%) 1 297 124 56·1 (4·9) 7 664 (17%) 20·5 11 734 (26%) 18·0 18 699 (42%) 37·4 10·8 44 074 (31%) Age at first birth, n (%) ≥25 years 67 250 (33%) 61 042 (35%) 129 031 (38%) 103 017 (41%) 52 677 (43%) 17 492 (46%) Postmenopausal, n (%) 162 551 (81%) 136 544 (81%) 269 384 (81%) 197 618 (80%) 97 855 (80%) 30 900 (79%) Ever use of oral contraceptives, n (%) 133 979 (58%) 114 105 (59%) 228 669 (60%) 173 520 (61%) 85 522 (60%) Current use of HRT, n (%) 75 151 (33%) 63 865 (33%) 128 891 (34%) 98 086 (34%) 48 516 (34%) 15 637 (34%) Follow-up for cancer incidence Woman-years, millions 2·1 1·8 3·5 2·6 1·3 0·4 Number of incident cancers 15 792 14 213 28 806 22 571 11 902 4 092 * The categories of height are those reported at recruitment, and mean values are those measured in a randomly selected sample. ‡ Standardised to the distribution of categories of self-reported height in our whole analysis population. 38·2 80·5 27 571 (60%) 33·6 155 156·5 (2·3) 160 160·4 (2·9) Mean measured height (SD) Characteristics at recruitment Number of women 233 516 196 773 Mean age, years (SD) 56·3 (4·9) 56·2 (4·9) Socioeconomic status, n (%) in lowest quintile 59 220 (26%) Current smokers, n (%) 50 775 (23%) 40 500 (22%) Alcohol intake, n (%) ≥7 units per week 47 138 (20%) Body-mass index, n (%) BMI ≥30 54 550 (25%) 38 493 (20%) Strenuous exercise, n (%) once a week or more 76 917 (35%) Age at menarche, n (%) ≥14 years 79 858 (35%) 69 718 (36%) Parity, n (%) nulliparous 22 827 (10%) 19 149 (10%) Number of full-term pregnancies, n (%) with three or more 11·7 97 376 EXAMPLE 4. Original data: Data re-use: Clinical Data Example 1. Kline et al. (2002) The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE. I will use four variables from their dataset today: Pulmonary embolism (yes/no) Age (years) Shock index = heart rate/systolic BP Shock index categories = take shock index and divide it into 10 groups (lowest to highest shock index) Descriptive Statistics Types of Variables: Overview Categorical binary nominal Quantitative ordinal discrete continuous 2 categories + more categories + order matters + numerical + uninterrupted Categorical Variables Also known as “qualitative.” Categories. treatment groups exposure groups disease status Categorical Variables Dichotomous (binary) – two levels Dead/alive Treatment/placebo Disease/no disease Exposed/Unexposed Heads/Tails Pulmonary Embolism (yes/no) Male/female Categorical Variables Nominal variables – Named categories Order doesn’t matter! The blood type of a patient (O, A, B, AB) Marital status Occupation Categorical Variables Ordinal variable – Ordered categories. Order matters! Staging in breast cancer as I, II, III, or IV Birth order—1st, 2nd, 3rd, etc. Letter grades (A, B, C, D, F) Ratings on a scale from 1-5 Ratings on: always; usually; many times; once in a while; almost never; never Age in categories (10-20, 20-30, etc.) Shock index categories (Kline et al.) Quantitative Variables Numerical variables; may be arithmetically manipulated. Counts Time Age Height Quantitative Variables Discrete Numbers – a limited set of distinct values, such as whole numbers. Number of new AIDS cases in CA in a year (counts) Years of school completed The number of children in the family (cannot have a half a child!) The number of deaths in a defined time period (cannot have a partial death!) Roll of a die Quantitative Variables Continuous Variables - Can take on any number within a defined range. Time-to-event (survival time) Age Blood pressure Serum insulin Speed of a car Income Shock index (Kline et al.) Review Question 1 Which of the following variables would be considered a continuous variable? a. b. c. d. e. Favorite fruit Gender Decade of birth Age at first birth Parity Review Question 2 Which of the following variables would be considered a nominal (categorical) variable? a. b. c. d. e. Favorite fruit Gender Decade of birth Age at first birth Parity Looking at Data How are the data distributed? Where is the center? What is the range? What’s the shape of the distribution (e.g., Gaussian, binomial, exponential, skewed)? Are there “outliers”? Are there data points that don’t make sense? The first rule of statistics: USE COMMON SENSE! 90% of the information is contained in the graph. Frequency Plots (univariate) Categorical variables Bar Chart Continuous variables Box Plot Histogram Bar Chart Used for categorical variables to show frequency or proportion in each category. Translate the data from frequency tables into a pictorial representation… Bar Chart: categorical variables no yes Bar Chart for SI categories 200.0 Note how much easier it is to extract information from a bar chart than from a table! 183.3 Number of Patients 166.7 150.0 133.3 116.7 100.0 83.3 66.7 50.0 33.3 16.7 0.0 1 2 3 4 5 6 7 Shock Index Category 8 9 10 Box plot and histograms To show the distribution (shape, center, range, variation) of continuous variables. Shape of a Distribution Describes how data are distributed Measures of shape Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean Box Plot: Shock Index Shock Index Units 2.0 maximum (1.7) Outliers 1.3 Q3 + 1.5IQR = .8+1.5(.25)=1.175 “whisker” 0.7 75th percentile (0.8) median (.66) 25th percentile (0.55) interquartile range (IQR) = .8-.55 = .25 minimum (or Q11.5IQR) 0.0 SI Histogram of SI 25.0 Bins of size 0.1 (automatically generated) Note the “right skew” Percent 16.7 8.3 0.0 0.0 0.7 1.3 SI 2.0 Histogram 6.0 100 bins (too much detail) Percent 4.0 2.0 0.0 0.0 0.7 1.3 SI 2.0 Histogram 200.0 2 bins (too little detail) Percent 133.3 66.7 0.0 0.0 0.7 1.3 SI 2.0 Box Plot: Shock Index Shock Index Units 2.0 Also shows the “right skew” 1.3 0.7 0.0 SI Distribution Shape and Box-and-Whisker Plot Left-Skewed Q1 Q2 Q3 Symmetric Q1 Q2 Q3 Right-Skewed Q1 Q2 Q3 Box Plot: Age 100.0 maximum More symmetric 66.7 75th percentile Years interquartile range median 25th percentile 33.3 minimum 0.0 AGE Variables Histogram: Age Not skewed, but not bell-shaped either… 14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 AGE (Years) 100.0 Some histograms from your class (n=25) Starting with politics… Health Care Law Feelings about math and writing… Optimism… Diet… Habits… Homework and optimism? (bivariate) Review Question 3 Which of the following graphics should be used for categorical variables? a. b. c. d. Histogram Box plot Bar Chart Stem-and-leaf plot Review Question 4 What is the first thing you should do when you get new data? a. b. c. d. Run a ttest Calculate a p-value Plot your data Run multivariate regression Review Question 5 Approximately what percent of subjects had pulses between 80 and 90? 40.0 a. 200% Percent 26.7 b. 100% c. 90% 13.3 d. 50% e. 10% 0.0 60.0 80.0 PULSE_OX 100.0 120.0 Review Question 6 What is the maximum pulse that any subject had? 40.0 a. =100 Percent 26.7 b. <=100 c. >100 13.3 d. >=100 0.0 60.0 80.0 PULSE_OX 100.0 120.0 Review Question 7 This distribution of the variable (pulse) would be described as? Histogram Percent 40.0 26.7 13.3 a. b. c. Symmetric Right-skewed Left-skewed 0.0 60.0 80.0 100.0 PULSE_OX 120.0 Measures of central tendency Mean Median Mode Central Tendency Mean – the average; the balancing point calculation: the sum of values divided by the sample size n In math shorthand: X X i1 n i X1 X2 Xn n Mean: example Some data: Age of participants: 17 19 21 22 23 23 23 38 n X X i 1 n i 17 19 21 22 23 23 23 38 23.25 8 Mean of age in Kline’s data Descriptive Statistics Report Page/Date/Time1 3/30/2006 10:25:14 AM DatabaseC:\Program Files\NCSS97\Data\Dawson\kline.S0 Means Section of AGE Mean 50.19334 GeometricHarmonic Median Mean Mean Sum 49 46.66865 43.00606 46730 14.0 Mode 49 556.9546 Percent Parameter Value 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Mean of age in Kline’s data Percent 14.0 9.3 4.7 0.0 0.0 33.3 66.7 The balancing point 100.0 Mean of Pulmonary Embolism? (Binary variable?) n X X i 1 100.0 181 *Histogram 1 750 * 0 181 .1944 931 931 80.56% (750) Percent 66.7 n i 33.3 0.0 0.0 19.44% (181) 0.3 0.7 PE 1.0 Mean The mean is affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 1 2 3 4 5 15 3 5 5 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 1 2 3 4 10 20 4 5 5 Central Tendency Median – the exact middle value Calculation: If there are an odd number of observations, find the middle value If there are an even number of observations, find the middle two values and average them. Median: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Median = (22+23)/2 = 22.5 Median of age in Kline’s data Means Section of AGE Mean 50.19334 Mode 49 14.0 Percent Parameter Value GeometricHarmonic Median Mean Mean Sum 49 46.66865 43.00606 46730 9.3 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Median of age in Kline’s data Percent 14.0 50% 50% of mass of mass 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Does PE have a median? Yes, if you line up the 0’s and 1’s, the middle number is 0. Median The median is not affected by extreme values (outliers). 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 Central Tendency Mode – the value that occurs most frequently Mode: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Mode = 23 (occurs 3 times) Mode of age in Kline’s data Means Section of AGE Parameter Value Mean 50.19334 GeometricHarmonic Median Mean Mean Sum 49 46.66865 43.00606 46730 Mode 49 Mode of PE? 0 appears more than 1, so 0 is the mode. Mode Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Which measure of central tendency is “best”? Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be reported for a region – less sensitive to outliers Measures of Variation/Dispersion Range Percentiles/quartiles Interquartile range Standard deviation/Variance Range Difference between the largest and the smallest observations. Range of age: 94 years-15 years = 79 years 14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 AGE (Years) 100.0 Range of PE? 1-0 = 1 Quartiles 25% Q 25% 1 25% Q 2 25% Q 3 The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile Interquartile Range Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1 Interquartile Range: age minimum Q1 25% 15 Median (Q2) 25% 35 Q3 25% 49 maximum 25% 65 Interquartile range = 65 – 35 = 30 94 Sample Variance Average (roughly) of squared deviations of values from the mean n S 2 (x X ) i i n 1 2 Why squared deviations? Adding deviations will yield a sum of 0. Absolute values are tricky! Squares eliminate the negatives. Result: Increasing contribution to the variance as you go farther from the mean. Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data n S (x X ) i i n 1 2 Calculation Example: Sample Standard Deviation Age data (n=8) : 17 19 21 22 23 23 23 38 n=8 Mean = X = 23.25 (17 23.25) 2 (19 23.25) 2 (38 23.25) 2 S 8 1 280 6.3 7 Std. dev is a measure of the “average” scatter around the mean. 14.0 Percent 9.3 Estimation method: if the distribution is bell shaped, the range is around 6 SD, so here rough guess for SD is 79/6 = 13 4.7 0.0 0.0 33.3 66.7 AGE (Years) 100.0 Std. Deviation age Variation Section of AGE Parameter Value Variance 333.1884 Standard Deviation 18.25345 Std Dev of Shock Index 250.0 Std. dev is a measure of the “average” scatter around the mean. Count 187.5 Estimation method: if the distribution is bell shaped, the range is around 6 SD, so here rough guess for SD is 1.4/6 =.23 125.0 62.5 0.0 0.0 0.5 1.0 SI 1.5 2.0 Std. Deviation SI Variation Section of SI Parameter Variance Value 4.155749E-02 1.430856 Standard Deviation 0.2038566 Std Error of Mean 6.681129E-03 Interquartile Range Range 0.2460432 Std. Dev of binary variable, PE 181 * (1 .1944 ) 2 750 * (0 .1944 ) 2 S Std. dev is a measure of 931 1 the “average” scatter 145 .8 around the mean. .3959 930 80.56% 19.44% Std. Deviation PE Variation Section of PE Parameter Variance Standard Deviation Value 0.156786 0.3959621 Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 3.338 20 21 Mean = 15.5 S = 0.926 20 21 Mean = 15.5 S = 4.570 Data B 11 12 13 14 15 16 17 18 19 Data C 11 12 13 14 15 16 17 18 19 SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall Bienaymé-Chebyshev Rule Regardless of how the data are distributed, a certain percentage of values must fall within K standard deviations from the mean: Note use of (mu) to represent “mean”. At least Note use of (sigma) to represent “standard deviation.” within (1 - 1/12) = 0% …….….. k=1 (μ ± 1σ) (1 - 1/22) = 75% …........ k=2 (μ ± 2σ) (1 - 1/32) = 89% ………....k=3 (μ ± 3σ) Symbol Clarification S = Sample standard deviation (example of a “sample statistic”) = Standard deviation of the entire population (example of a “population parameter”) or from a theoretical probability distribution X = Sample mean µ = Population or theoretical mean **The beauty of the normal curve: No matter what and are, the area between - and + is about 68%; the area between -2 and +2 is about 95%; and the area between -3 and +3 is about 99.7%. Almost all values fall within 3 standard deviations. 68-95-99.7 Rule 68% of the data 95% of the data 99.7% of the data Summary of Symbols S2= Sample variance S = Sample standard dev 2 = Population (true or theoretical) variance = Population standard dev. X = Sample mean µ = Population mean IQR = interquartile range (middle 50%) Review Question 8 All of the following are measures of data variation EXCEPT: a. b. c. d. e. Variance Interquartile range Standard deviation Range Mean Review Question 9 All of the following are influenced by outliers EXCEPT: a. b. c. d. e. Variance Interquartile range Standard deviation Range Mean Review Question 10 a. b. c. d. e. If you have right-skewed data, which of the following will be true? Mean > median Mean > = median Median > = mean Median > mean Mean = median Review Question 11 a. b. c. d. e. How much of your data is guaranteed to fall within 2 standard deviations of the mean? None—there are no guarantees. 95% 99% 75% 89% Examples of bad graphics What’s wrong with this graph? from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, 1983, p.69 Notice the Xaxis From: Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot Wainer, H. 1997, p.29. Correctly scaled X-axis… Report of the Presidential Commission on the Space Shuttle Challenger Accident, 1986 (vol 1, p. 145) The graph excludes the observations where no O-rings failed. Smooth curve at least shows the trend toward failure at high and low temperatures… http://www.math.yorku.ca/SCS/Gallery/ Even better: graph all the data (including non-failures) using a logistic regression model Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87, 423-426 What’s wrong with this graph? from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, 1983, p.74 What’s the message here? Diagraphics II, 1994 Diagraphics II, 1994 For more examples… http://www.math.yorku.ca/SCS/Gallery/ Class exercise What’s wrong with these graphs? From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995. From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995. “Lying” with statistics More accurately, misleading with statistics… Example 1: projected statistics Lifetime risk of melanoma: 1935: 1/1500 1960: 1/600 1985: 1/150 2000: 1/74 2006: 1/60 http://www.melanoma.org/mrf_facts.pdf Example 1: projected statistics How do you think these statistics are calculated? How do we know what the lifetime risk of a person born in 2006 will be? Example 1: projected statistics Interestingly, a clever clinical researcher recently went back and calculated (using SEER data) the actual lifetime risk (or risk up to 70 years) of melanoma for a person born in 1935. The answer? Closer to 1/150 (one order of magnitude off) (Martin Weinstock of Brown University, AAD conference 2006) Example 2: propagation of statistics In many papers and reviews of eating disorders in women athletes, authors cite the statistic that 15 to 62% of female athletes have disordered eating. I’ve found that this statistic is attributed to about 50 different sources in the literature and cited all over the place with or without citations... For example… In a recent review (Hobart and Smucker, The Female Athlete Triad, American Family Physician, 2000): “Although the exact prevalence of the female athlete triad is unknown, studies have reported disordered eating behavior in 15 to 62 percent of female college athletes.” No citations given. And… Fact Sheet on eating disorders: “Among female athletes, the prevalence of eating disorders is reported to be between 15% and 62%.” Citation given: Costin, Carolyn. (1999) The Eating Disorder Source Book: A comprehensive guide to the causes, treatment, and prevention of eating disorders. 2nd edition. Lowell House: Los Angeles. And… From a Fact Sheet on disordered eating from a college website: “Eating disorders are significantly higher (15 to 62 percent) in the athletic population than the general population.” No citation given. And… “Studies report between 15% and 62% of college women engage in problematic weight control behaviors (Berry & Howe, 2000).” (in The Sport Journal, 2004) Citation: Berry, T.R. & Howe, B.L. (2000, Sept). Risk factors for disordered eating in female university athletes. Journal of Sport Behavior, 23(3), 207-219. And… 1999 NY Times article “But informal surveys suggest that 15 percent to 62 percent of female athletes are affected by disordered behavior that ranges from a preoccupation with losing weight to anorexia or bulimia.” And “It has been estimated that the prevalence of disordered eating in female athletes ranges from 15% to 62%.” ( in Journal of General Internal Medicine 15 (8), 577-590.) Citations: Steen SN. The competitive athlete. In: Rickert VI, ed. Adolescent Nutrition: Assessment and Management. New York, NY: Chapman and Hall; 1996:223 47. Tofler IR, Stryer BK, Micheli LJ. Physical and emotional problems of elite female gymnasts. N Engl J Med. 1996;335:281 3. Where did the statistics come from? The 15%: Dummer GM, Rosen LW, Heusner WW, Roberts PJ, and Counsilman JE. Pathogenic weight-control behaviors of young competitive swimmers. Physician Sportsmed 1987; 15: 75-84. The “to”: Rosen LW, McKeag DB, O’Hough D, Curley VC. Pathogenic weight-control behaviors in female athletes. Physician Sportsmed. 1986; 14: 79-86. The 62%:Rosen LW, Hough DO. Pathogenic weight-control behaviors of female college gymnasts. Physician Sportsmed 1988; 16:140-146. Where did the statistics come from? Study design? Control group? Cross-sectional survey (all) No non-athlete control groups Population/sample size? Convenience samples Rosen et al. 1986: 182 varsity athletes from two midwestern universities (basketball, field hockey, golf, running, swimming, gymnastics, volleyball, etc.) Dummer et al. 1987: 486 9-18 year old swimmers at a swim camp Rosen et al. 1988: 42 college gymnasts from 5 teams at an athletic conference Where did the statistics come from? Measurement? Instrument: Michigan State University Weight Control Survey Disordered eating = at least one pathogenic weight control behavior: Self-induced vomiting fasting Laxatives Diet pills Diuretics In the 1986 survey, they required use 1/month; in the 1988 survey, they required use twice-weekly In the 1988 survey, they added fluid restriction Where did the statistics come from? Findings? Rosen et al. 1986: 32% used at least one “pathogenic weight-control behavior” (ranges: 8% of 13 basketball players to 73.7% of 19 gymnasts) Dummer et al. 1987: 15.4% of swimmers used at least one of these behaviors Rosen et al. 1988: 62% of gymnasts used at least one of these behaviors Citation Tree… Figure 4A from: Smith N P et al. J Exp Biol 2007;210:1576-1583. Figure 4B from: Smith N P et al. J Exp Biol 2007;210:1576-1583. Homework Problem Set 1 Reading: Chapters 1-6 Vickers. Read weekly journal article Fill out a “Journal Article Review Sheet” (on class website). Who wants to lead journal article discussion next week? References http://www.math.yorku.ca/SCS/Gallery/ Kline et al. Annals of Emergency Medicine 2002; 39: 144-152. Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87, 423426 Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, 1983. Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot Wainer, H. 1997. Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.