Download Data Analysis - freshmanclinic

Dr. Hong Zhang         Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR) Coefficient of Correlation  Statistics is a huge field, I’ve simplified considerably here. For example: ◦ Mean, Median, and Standard Deviation  There are alternative formulas ◦ Standard Error and the 95% Confidence Interval  There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) ◦ Error Bars  Don’t go beyond the interpretations I give here! ◦ Comparing Means of Two Data Sets  We just cover the t test for two means when the variances are unknown but equal, there are other tests ◦ Linear Regression  We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR!  All of the possible outcomes of experiment or observation ◦ US population ◦ Cars in market  A large population may be impractical and costly to study. It might be impossible to collect data from every member of the population. ◦ Weight and height of every US citizen ◦ Quality of every car in market  A part of population that we actually measure or observe and to draw outcome or conclusion ◦ 1000 US citizens ◦ 100 cars  We use samples to estimate population properties ◦ Use 1000 US citizens to estimate the height of entire US population ◦ Use 100 cars to estimate quality of all Toyota Corolla cars under 3 years old  Sample should fully represent the entire population. ◦ Good  Randomly select 1000 names from a phone book to represent the region  Randomly select 100 cars from DMV record ◦ Bad  Use a college campus to represent the country  Use cars in dealers lot to represent cars in market  Reporters randomly stop 3 persons on street for opinions   Sum of values divided by number of samples, also called Average Example: ◦ ◦ ◦ ◦  Data: 3, 8, 5, 10, 4, 6 Sum = 3+8+5+10+4+6 = 36 Number of samples (data points) = 6 Mean = 36 / 6 = 6 Exercise ◦ Mean of height of the entire class ◦ Average commute time of the students     Bill Gates comes to give a presentation to 100 of students in Rowan Auditorium. Suppose the personal wealth of Bill Gates is $50 billion. The personal wealth of each student is $0. What is the mean of the personal wealth for the entire population in the room?   Value of the middle item of data arranged in increasing or decreasing order of magnitude Example: ◦ Data: 3, 8, 5, 10, 4, 6 ◦ Rearrange: 3, 4, 5, 6, 8, 10 ◦ The middle two are 5 & 6, the average of the two is 5.5 ◦ The mean of the data set is 5.5  Exercise: ◦ Medium height of the class ◦ Medium commute time of the class ◦ Medium personal wealth in the room with Bill Gates. 12 10 8 Data 6 Mean 4 Mediam 2 0 1 2 3 4 5 6 Data Points: 3, 8, 5, 10, 4, 6  Standard deviation of mean ◦ Sample size n ◦ taken from population with standard deviation s s sX  n ◦ Estimate of mean depends on sample selected ◦ As n , variance of mean estimate goes down, i.e., estimate of population mean improves ◦ As n , mean estimate distribution approaches normal, regardless of population distribution   x i        n      1/ 2 2 μ: Mean, n: Sample size, xi: Data point  xi  x    s  n       xi  x    s  n 1      2 1/ 2 For n > 30 2 1/ 2 For n < 30 2 S=s  Data: 70 69 60 65 72 80 75 64 68 85 66 72 Frequancy 6 5 4 3 Frequancy 2 1 0 <60 60~65 65~70 70~75 75~80  Flip a coin, chances of upside up and downside up are equal. (It’s also called binomial dist.) 50% up dow n  Normal distribution ◦ Women’s shoe size sold by a shoe store.  Chemical distribution of a well mixed compound Y   2   e (x  ) 2 2 2 where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828. Nσ Confidence Intervals Error per million 1 2 3 0.682689492137 0.954499736104 0.997300203937 317310.5079 45500.2639 2699.796063 4 5 6 0.999936657516 0.999999426697 0.999999998027 63.342484 0.573303 0.001973 6 sigma    Rank k has a frequency roughly proportional to 1/k, or more accurately Pn=a/nb Developed by George Kingsley Zipf Occurs naturally in many situations ◦ ◦ ◦ ◦ City population Colors in images Call center Website traffic Rank Word Freq 1 the 69970 2 of 36410 3 and 28854 4 to 26154 5 a 23363 6 in 21345 7 that 10594 8 is 10102 9 was 9815 10 he 11 for 12 it 13 with 14 as 15 his 16 on 17 be 18 at 19 by 20 I % Freq 6.8872 3.5839 2.8401 2.5744 2.2996 2.1010 1.0428 0.9943 0.9661 9542 9489 8760 7290 7251 6996 6742 6376 5377 5307 5180 Theoretical 69970 36470 24912 19009 15412 12985 11233 9908 8870 0.9392 0.9340 0.8623 0.7176 0.7137 0.6886 0.6636 0.6276 0.5293 0.5224 0.5099 Zipf Distribution 8033 7345 6768 6277 5855 5487 5164 4878 4623 4394 4187  If a distribution gives us a straight line on a log-log scale, then we can say that it is a Zipf Distribution.  Count the vehicles in Rowan Parking lots ◦ Distribution of colors ◦ Distribution of cars and trucks ◦ Distribution of last letter (digit) of license number      Select a parking lot Design a strategy to count Design a method to record data Design a method to represent result Write a one page report per group        White:2 Black:1 Red:2 Blue:2 Silver:4 Gold: 1 Beige: 1 Voltage (V) Height (in) 2.34 8.69 2.56 11.88 2.79 15.19 2.98 17.88 3.13 19.94 3.27 22.06 3.47 25.00 3.62 27.06 Result for Pressure Transducer Calibration Pressure Transducer Calibration 30 Height (in) 25 20 15 10 5 0 2 2.5 3 Output Voltage (V) 3.5 4 Pressure Transducer Calibration 30 y = 14.361x - 24.908 Height (in) 25 R² = 0.9999 20 15 10 5 0 2 2.5 3 Output Voltage (V) 3.5 4 Time (s) Voltage (V) 0 10 1 6.1 2 3.7 3 2.2 4 1.4 5 0.8 6 0.5 7 0.3 8 0.2 9 0.1 10 0.07 12 0.03 Time (s) Voltage (V) log(Voltage) 0 10 1.00 1 6.1 0.79 2 3.7 0.57 3 2.2 0.34 4 1.4 0.15 5 0.8 -0.1 6 0.5 -0.3 7 0.3 -0.52 8 0.2 -0.7 9 0.1 -1 10 0.07 -1.15 12 0.03 -1.52 Capacitor Discharge Rate: Semilog Coordinates Voltage (V) 10.00 1.00 0.10 0.01 0.0 5.0 10.0 Time (s) 15.0 Reaction Rate for Polymer Production Concentration Reaction Rate (Mol/ft3) (Mol/s) 3 2.5 2.8500 80.0 2.0000 60.0 1.2500 40.0 0.6700 20.0 0.2200 0.5 10.0 0.0720 0 5.0 0.0240 Reaction Rate (mol/s) 100.0 2 1.5 1 0 50 100 Concentration (Mol/ft^3) 1.0 0.0018 150 log log (reaction rate) (concentration) Concentration Reaction Rate 100.0 2.8500 2.00 0.45 80.0 2.0000 1.90 0.30 60.0 1.2500 1.78 0.10 40.0 0.6700 1.60 -0.17 20.0 0.2200 1.30 -0.66 10.0 0.0720 1.00 -1.14 5.0 0.0240 0.70 -1.62 1.0 0.0018 0.00 -2.74 Polymer Reaction Rate: log plot Polymer Reaction Rate: Cartesian Coordinates 10.000 0.5 Reaction Rate (mol/s) log [Reaction Rate (mol/s)] 1.0 0.0 -0.5 -1.0 -1.5 -2.0 1.000 0.100 0.010 -2.5 0.001 -3.0 0.0 0.5 1.0 1.5 log [Concentration (mol/ft3)] 2.0 1 10 Concentration (mol/ft3) 100 Table 1: Average Turbidity and Color of Water Treated by Portable Water Filter a Water Turbidity (NTU) True Color (Pt-Co) (1) Pond Water (2) 10 (3) 13 Apparent Color (Pt-Co) (4) 30 Sweetwater 4 4 55 12 12 Hiker 3 8 11 MiniWorks 2 3 5 Standard 5a 15 15 Level at which humans can visually detect turbidity Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns Consistent Format, Title, Units Good Axis Titles, Big Fonts 25 20 20 Turbidity (NTU) Turbidity (NTU) 25 15 10 20 11 15 11 10 10 7 5 5 5 1 0 0 Pond Water Sweetwater Pond Water Sweetwater Miniworks Hiker Miniworks Hiker Pioneer Pioneer Voyager Voyager Filter Filter Figure 1: Turbidity of Pond Water, Treated and Untreated

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Analysis - freshmanclinic