Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dr.S.Nishan Silva (MBBS) My weight day weight 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 140 140.1 139.8 140.6 140 139.8 139.6 140 140.8 139.7 140.2 141.7 141.9 141.4 142.3 142.3 141.9 142.1 142.5 142.3 142.1 142.5 143.5 143 143.2 143 143.4 143.5 142.7 143.7 day 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 weight day 143.9 144 142.5 142.9 142.8 143.9 144 144.8 143.9 144.5 143.9 144 144.2 143.8 143.5 143.8 143.2 143.5 143.6 143.4 143.9 143.6 144 143.8 143.6 143.8 144 144.2 144 143.9 weight 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 Plot as a function of time data was acquired: 144 144.2 144.5 144.2 143.9 144.2 144.5 144.3 144.2 144.9 144 143.8 144 143.8 144 144.5 143.7 143.9 144 144.2 144 144.4 143.8 144.1 day Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16) 146 145 144 weight (lbs) weight 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 140 140.1 139.8 140.6 140 139.8 139.6 140 140.8 139.7 140.2 141.7 141.9 141.4 142.3 142.3 141.9 142.1 142.5 142.3 142.1 142.5 143.5 143 143.2 143 143.4 143.5 142.7 143.7 day 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 weight day 143.9 144 142.5 142.9 142.8 143.9 144 144.8 143.9 144.5 143.9 144 144.2 143.8 143.5 143.8 143.2 143.5 143.6 143.4 143.9 143.6 144 143.8 143.6 143.8 144 144.2 144 143.9 143 142 Do not use curved lines to connect data points – that assumes you know more about the relationship of the data than you really do 141 140 139 0 10 20 30 Day 40 50 60 weight 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 144 144.2 144.5 144.2 143.9 144.2 144.5 144.3 144.2 144.9 144 143.8 144 143.8 144 144.5 143.7 143.9 144 144.2 144 144.4 143.8 144.1 Assume my weight is a single, random, set of similar data 25 Make a frequency chart (histogram) of the data 146 145 # of Observations 144 weight (lbs) 20 143 142 141 15 140 139 0 10 20 30 40 50 60 Day 10 5 0 Weight (lbs) Create a “model” of my weight and determine average Weight and how consistent my weight is 25 average 143.11 # of Observations 20 15 10 Inflection pt s = 1.4 lbs 5 0 Weight (lbs) s = standard deviation = measure of the consistency, or similarity, of weights 0.45 0.4 0.35 Amplitude Width is measured At inflection point = s 0.3 0.25 0.2 W1/2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 s Triangulated peak: Base width is 2s < W < 4s 5 0.45 0.4 Pp = peak to peak – or – largest separation of measurements 0.35 +/- 1s Area = 68.3% Amplitude 0.3 pp ~ 6s 0.25 0.2 0.15 0.1 Area +/- 2s = 95.4% 0.05 0 -5 -4 -3 -2 Area +/- 3s = 99.74 % -1 0 1 2 3 4 5 s Peak to peak is sometimes Easier to “see” on the data vs time plot pp ~ 6s (Calculated s= 1.4) 146 144.9 145 Peak to peak 143 25 142 20 # of Observations weight (lbs) 144 141 15 10 5 140 139.5 0 Weight (lbs) 139 0 10 20 30 Day s~ pp/6 = (144.9-139.5)/6~0.9 40 50 60 Inferential Statistics Used to determine the likelihood that a conclusion based on data from a sample is true Terms p value: the probability that an observed difference could have occurred by chance Standardised Normal distribution • Formula Z = X- µ ó Z – SND X – variable µ Mean and ó varience SND table of values Regression and Correlation • Correlation – To analyze the relationship between two variables • Regression – Dependant of the variable x on variable y – In this course we consider only two - In real life, multiple variable interactions are possible. Example : X = Height, Y = Body weight Basic Linear regression Equation • Equation: Y` = a + bx – b is the gradient, slope or regression coefficient – a is the intercept of the line at Y axis or regression constant – Y` is a value for the outcome – x is a value for the predictor (real x valye) Correlation Coefficient • Page 100 lower down Correlation coefficient ranges from 0 to 1 Correlation coefficient ranges from 0 to 1 Finding the significance of “r” • Simple correlation significance – http://www.biology.ed.ac.uk/archive/jdeacon/s tatistics/table6.html#Correlation coefficient • Pierson Product-moment coefficient – http://www.experimentresources.com/pearson-product-momentcorrelation.html • Refferences – Best http://www.biology.ed.ac.uk/archive/jdeacon/s tatistics/tress11.html – In detail http://www.statsdirect.com/help/regression_and_corr elation/rcr.htm Inferential Statistics – Page 102 • Sample statistics – “Generalized” to the entire population • Formulate hypothesis • ? Null Hypothesis • Prove hypothesis Types of Errors Truth No difference Conclusion TYPE II ERROR () No difference Difference Difference TYPE I ERROR () Power = 1- (100% - The probability of a type 2 error) confidence interval: The range of values we can be reasonably certain includes the true value. If the “probability” of the true value not being included is less than 5% we reject the null hypothesis Example The Use of the Null Hypothesis • Is the difference in two sample populations due to chance or a real statistical difference? • The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment. • If treatment A is no better than treatment B then the null hypothesis is supported. • If there is a significant difference between A and B then the null hypothesis is rejected... Parametric tests • T test Page 104 T Table T-test • T-test determines the probability that the null hypothesis concerning the means of two small samples is correct • The probability that two samples are representative of a single population (supporting null hypothesis) OR two different populations (rejecting null hypothesis) Use t-test to determine whether or not sample population A and B came from the same or different population t = x1-x2 / sx1-sx2 x1 (bar x) = mean of A ; x2 (bar x) = mean of B sx1 = std error of A; sx2 = std error of B Example: Sample A mean =8 Sample B mean =12 Std error of difference of populations =1 12-8/1 = 4 std deviation units Non Parametric test • Chi Squared test – Page 108 – Test for Goodness of fit – Test of independence Chi square • Used with discrete values • Phenotypes, choice chambers, etc. • Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30) • O= observed values • E= expected values http://course1.winona.edu/sberg/Equation/chi-squ2.gif Interpreting a chi square • • • • Calculate degrees of freedom # of events, trials, phenotypes -1 Example 2 phenotypes-1 =1 Generally use the column labeled 0.05 (which means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance. • Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values. How to use a chi square chart http://faculty.southwest.tn.edu/jiwilliams/probab2.gif T-test or Chi Square? Testing the validity of the null hypothesis • Use the T-test (also called Student’s Ttest) if using continuous variables from a normally distributed sample populations (ex. Height) • Use the Chi Square (X2) if using discrete variables (if you are evaluating the differences between experimental data and expected or hypothetical data)… Example: genetics experiments, expected distribution of organisms. Qualitative Analysis – Pages 113-114 • Phenomenology – Data collected using interviews, tapes etc – Analyzed as the researcher prefers – Describes using descriptive statistics • Ethnography – Data collected using note taking, observation etc – Categorised – Relationships between patterns, identified • Concurrent Analysis – Qualitative data is transformed to numerical data – Qualitative value may be lost Using Excel (Example) Microsoft Excel • • • • A Spreadsheet Application. It features calculation, graphing tools, pivot tables and a macro programming language called VBA (Visual Basic for Applications). There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007 are capable of performing a number of statistical analyses. Starting MS Excel: Double click on the Microsoft Excel icon on the desktop or Click on Start --> Programs --> Microsoft Excel. Worksheet: Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page. Each cell is referenced by its coordinates. For example, A3 is used to refer to the cell in column A and row 3. B10:B20 is used to refer to the range of cells in column B and rows 10 through 20. Microsoft Excel Opening a document: File Open (From a existing workbook). Change the directory area or drive to look for file in other locations. Creating a new workbook: FileNewBlank Document Saving a File: FileSave Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range. Creating Formulas: 1. Click the cell that you want to enter the formula, 2. Type = fx (an equal sign), 3. Click the Function Button, 4. Select the formula you want and step through the on-screen instructions. Microsoft Excel • Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and jan 9, 1999 as 1/9/1999. To enter today’s date, press Ctrl and ; together. Use a or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To enter current time, press Ctrl and : together. • Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying and Ctrl+V for Pasting. • Sorting: Data Sort Sort By … • Descriptive Statistics and other Statistical methods: ToolsData Analysis Statistical method. If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba Histograms in Excel 1 Select Tools/Data Analysis Histograms in Excel (continued) 2 Choose Histogram ( Input data range and bin range 3 (bin range is a cell range containing the upper class boundaries for each class grouping) Select Chart Output and click “OK” Microsoft Excel Statistical and Mathematical Function: Start with ‘=‘ sign and then select function from function wizard f x . Inserting a Chart: Click on Chart Wizard (or InsertChart), select chart, give, Input data range, Update the Chart options, and Select output range/ Worksheet. Importing Data in Excel: File open FileType Click on File Choose Option ( Delimited/Fixed Width) Choose Options (Tab/ Semicolon/ Comma/ Space/ Other) Finish. Limitations: Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extreme cases. Computing the Mean • Sum xi divide by n (or N for population mean) • Excel – =AVERAGE(cellrange) Computing the Mode • Value that occurs most often in discretized data • Excel – =MODE(cellrange) – Reports first value seen if tie Computing the Median • The middle value in sorted data • Excel – =MEDIAN(cellrange) Computing the Range • Range is min to max values • Excel – =MIN(cellrange) – =MAX(cellrange) Computing the Standard Deviation • Std. Dev. is Square-Root of Variance • Excel – =STDEV(cellrange) - sample – =STDEVP(cellrange) - population – =VAR(cellrange) - sample – =VARP(cellrange) - population Tables and Charts for Categorical Data: Univariate Data Categorical Data Graphing Data Tabulating Data Summary Table Bar Charts Pie Charts Pareto Diagram The Summary Table Summarize data by category Example: Current Investment Portfolio Investment Type (Variables are Categorical) Amount Percentage (in thousands $) (%) Stocks Bonds CD Savings 46.5 32.0 15.5 16.0 42.27 29.09 14.09 14.55 Total 110.0 100.0 Bar and Pie Charts • Bar charts and Pie charts are often used for qualitative (category) data • Height of bar or size of pie slice shows the frequency or percentage for each category Bar Chart Example Current Investment Portfolio Investment Type Amount Percentage (in thousands $) (%) Stocks Bonds CD Savings 46.5 32.0 15.5 16.0 42.27 29.09 14.09 14.55 Total 110.0 100.0 Investor's Portfolio Savings CD Bonds Stocks 0 10 20 30 Amount in $1000's 40 50 Pie Chart Example Investment Type Amount Percentage (in thousands $) (%) Stocks Bonds CD Savings 46.5 32.0 15.5 16.0 42.27 29.09 14.09 14.55 Total 110.0 100.0 Current Investment Portfolio Savings 15% Stocks 42% CD 14% Bonds 29% Percentages are rounded to the nearest percent Pareto Diagram Example 45% 100% 40% 90% 80% 35% 70% 30% 60% 25% 50% 20% 40% 15% 30% 10% 20% 5% 10% 0% 0% Stocks Bonds Savings CD cumulative % invested (line graph) % invested in each category (bar graph) Current Investment Portfolio Tabulating and Graphing Multivariate Categorical Data (continued) • Side by side bar charts C o m p arin g In vesto rs S avings CD B onds S toc k s 0 10 Inves tor A 20 30 Inves tor B 40 50 Inves tor C 60 Side-by-Side Chart Example • Sales by quarter for three sales territories: East West North 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 59 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9 60 50 40 East West North 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr http://www.bmj.com/bmjseries/statistics-notes Best source for you… BMJ Statistics notes…