Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Microsoft Excel for the statistical calculations Lections №4 Main Questions Using Microsoft Excel for the mathematic calculations. Statistical calculations in the Microsoft Excel. Curve Fitting Using Excel 1.Mathematic calculations in the Microsoft Excel Structure of the Excel equation. Arguments of functions in the Excel Equation Wizard 1.1.Structure of the Excel equation Simple equation example: =(А4+В8)*С6; Composite equation example: Equation start symbol reference to the cell (relative) Function with lists of the arguments Mathematic operator 1.2. Arguments of functions Constants – textual or numbering data; Reference to the cell – address of cell (or cells) that contain data for processing. There are two types of the reference: relative – change when equation moved around table, for example: F7; absolute – do not change when equation moved around table : on to the cell, for example : $F$7; on to the table column, for example : $F7; on to the table row, for example : F$7; 1.2.1. Arrays as arguments Array (range) – address of the cells are separated by : (colon) – you must define address of the left top and right bottom cells of the array. For example: definition C4:C7 represented the array with elements C4, C5, C6, C7; Set (union) – address of the cells are separated by ; (semicolon) – you must define address of the each cells of the array. For example: definition D2:D4;D6:D8 – represented the array with elements D2, D3, D4, D6, D7, D8. 1.3. Using the Equation Wizard Run wizard – use command Insert-Function of the main menu or click on Function icons on the toolbar Step 1 – in dialog box select category of the functions (Category list) and choose function name in sub-list. Click ОК to finish; Step 2 – input arguments of the function (constant or address of the cell). Different function has different counts of the arguments ; You can input data manual or click Choose button and select input area on the Excel’s worksheet. Step 1 : You can select category and function name Using the Equation Wizard Step 2 : You can input arguments of the function 2.Statistical calculations in the Microsoft Excel Descriptive statistics. Statistical hypothesis testing. Data Analysis add-on. 2.1.Descriptive statistics Statistic - Measure of a sample characteristic. Population - Contains all members of a group. Sample - A subset of a population. Interval Data - Objects classified by type or characteristic, with logical order and equal differences between levels of data. Ordinal Data - Objects classified by type or characteristic with some logical order. Variable - A characteristic that can form different values from one observation to another. Independent Variable - A measure that can take on different values which are subject to manipulation by the researcher. Response Variable - The measure not controlled in an experiment. Commonly known as the dependent variable. 2.1.1.Descriptive statistics For interval level data, measures of central tendency and variation are common descriptive statistics. Measures of central tendency describe a series of data with a single attribute. Measures of variation describe how widely the data elements vary. Standardized scores combine both central tendency and variation into a single descriptor that is comparable across different samples with the same or different units of measurement. For nominal/ordinal data, proportions are a common method used to describe frequencies as they compare to a total. 2.1.2.Descriptive statistics 2.1.3.Descriptive statistics Mean - the arithmetic average of the scores in a sample distribution. Median - the point on a scale of measurement below which fifty percent of the scores fall. Mode - the most frequently occurring score in a distribution. Range - The difference between the highest and lowest score (high-low). Variance - The average of the squared deviations between the individual scores and the mean. The larger the variance the more variability there is among the scores. Standard deviation - The square root of variance. It provides a representation of the variation among scores that is directly comparable to the raw scores. 2.1.4.Descriptive statistics 2.1.5.Descriptive statistics Statistical parameter name Excel function name English ver. Russian ver. AVERAGE СРЗНАЧ Max MAX МАКС Min MIN МИН Variance VAR ДИСП Standart deviation STDEV СТАНДОТКЛОН Coef. of skewness SKEWNEES СКОС KURT ЭКСЦЕС Mean Coef. of kurtosis 2.2.Statistical Hypothesis Testing The Normal Distribution. Although there are numerous sampling distributions used in hypothesis testing, the normal distribution is the most common example of how data would appear if we created a frequency histogram where the x axis represents the values of scores in a distribution and the y axis represents the frequency of scores for each value. Most scores will be similar and therefore will group near the center of the distribution. Some scores will have unusual values and will be located far from the center or apex of the distribution. . 2.2.1.The Normal Distribution Properties of a normal distribution: Forms a symmetric bell-shaped curve 50% of the scores lie above and 50% below the midpoint of the distribution Curve is asymptotic to the x axis Mean, median, and mode are located at the midpoint of the x axis 2.2.Statistical Hypothesis Testing Hypothesis testing is used to establish whether the differences exhibited by random samples can be inferred to the populations from which the samples originated. Chain of reasoning for inferential statistics Sample(s) must be randomly selected Sample estimate is compared to underlying distribution of the same size sampling distribution Determine the probability that a sample estimate reflects the population parameter 2.2.1.Statistical Hypothesis Testing The four possible outcomes in hypothesis testing: Actual Population Comparison Null Hyp. True (there is no difference) Null Hyp. False (there is a difference) Rejected Null Hypothesis Type I error (alpha) Correct Decision Did not Reject Null Correct Decision Type II Error DECISION 2.2.2.Statistical Hypothesis Testing When conducting statistical tests with computer software, the exact probability of a Type I error is calculated. It is presented in several formats but is most commonly reported as "p <" or "Sig." or "Signif." or "Significance." The following table links p values with a benchmark alpha of 0.05: P < Alpha Probability of Type I Error 0.05 0.05 5% chance difference is not significant 0.10 0.05 10% chance difference is not significant 0.01 0.05 1% chance difference is not significant 0.96 0.05 96% chance difference is not significant Final Decision Statistically signif. Not statistically signif. Statistically signif. Not statistically signif. 2.2.3.Statistical Hypothesis Testing General assumptions: Population is normally distributed Random sampling Mutually exclusive comparison samples Data characteristics match statistical technique. For interval / ratio data use: t-tests, Pearson correlation, ANOVA, regression For nominal / ordinal data use: Difference of proportions, chi square and related measures of association 2.2.4.Hypothesis Testing Testing State the Hypothesis Null Hypothesis (Ho): There is no difference between ___ and ___. Alternative Hypothesis (Ha): There is a difference between __ and __. Rejection Criteria This determines how different the parameters and/or statistics must be before the null hypothesis can be rejected. This "region of rejection" is based on alpha () - the error associated with the confidence level. The point of rejection is known as the critical value. For the medical investigations use value = 0,05 (5%). Practical point of the view Statistical point of the view Comparing the control and experimental samples Comparing Two Independent Sample Means Additional conditions Normal distribution Not Normal distribution Comparing the sample data before and after experiment Comparing a Sample Mean to a constant Comparing the parameter diffusion in two samples Comparing Two Dependent Sample Means Comparing a Population Mean to a Sample Mean Comparing Two Independent Sample Variances Appropritate method Variances are equal T-test with homogeneity of Variance Variances are not equal T-test without homogeneity of Variance Without variance test T-test without variance test Variances are equal U-test (Willcocson Mann – Uitny) Without variance test Median test Normal distribution T-test for the dependent sample Not Normal distribution One sample signed test (Willcocson) Normal distribution Comparing a constant to a Sample Mean (T-test) Not Normal distribution Gupt signed test Normal distribution Computing F-ratio Not Normal distribution Zigel-Tiuky, Mozes tests 2.3.The Analysis ToolPak Performing statistical analyses on sample data is very convenient to do in Excel. It has dozens of built-in spreadsheet functions that allow us to perform all sorts of statistics calculations. The Analysis ToolPak add-in also contains several other statistical tools. To make sure you have the Analysis ToolPak add-in available in your version of Excel, select Tools from the main menu bar and see if the Data Analysis menu option appears toward the bottom of the Tools menu. If not, select Tools - Add-Ins from the main menu bar and select the Analysis ToolPak option from the list. 2.3.1.The Analysis ToolPak The Analysis ToolPak provides several tools for conducting statistical tests. These tools include: F-Test Two-Sample for Variances t-Test Paired Two-Sample for Means t-Test Two-Sample Assuming Equal Variances t-Test Two-Sample Assuming Unequal Variances z-Test Two-Sample for Means To access these tools, select Tools Data Analysis from the main menu bar to open the Data Analysis dialog box. You'll find each of the statistical test tools listed in this dialog box. MS EXCEL Add-ins dialog box The Data Analysis ToolPak Data Analysis dialog box 3. Curve Fitting Using Excel Understanding Curve Fitting. MS Excel trendline feature. 3.1. Understanding Curve Fitting Curve fitting is the process of trying to find the curve (which is represented by some model equation) that best represents the sample data, or more specifically the relationship between the independent and dependent variables in the dataset. When the results of the curve fit are to be used for making new predictions of the dependent variable, this process is known as regression analysis. 3.1. Understanding Curve Fitting The Linear trendline uses the equation: у = k • x + b, – where k and b are parameters to be determined during the curve-fitting process. The Logarithmic trendline uses the equation: у = с • ln(x) + b, – where c and b are parameters to be determined during the curve-fitting process. 3.1. Understanding Curve Fitting The Power trendline uses the equation: у = с • хb, – where c and b are parameters to be determined during the curve-fitting process. The Exponential trendline uses the equation: у = с • еb • х, – where c and b are parameters to be determined during the curve-fitting process. 3.1. Understanding Curve Fitting The Polynomial trendlines use the equation: у = b + с1 х + с2 х2 + с3 х3 + с4 х4 + с5 х5 +с6 х6 – where the c-coefficients and b are parameters of the curve fit. Excel supports polynomial fits up to sixth order. 3.2. MS Excel trendline feature The 5 listed before curve fits are easily generated using the trendline feature built into Excel's XY scatter chart. Once you've plotted your data using an XY scatter chart, you can generate a trendline that will be displayed on your chart, superimposed over your data. You can also include the resulting equation for the best-fit line on your chart. 3.2. MS Excel trendline feature To use a trendline feature in the Excel chart: Create chart, that based on your data samples (recommended use an XY scatter or linear chart type). Right-click on the data series and select Add Trendline from the pop-up menu. The Add Trendline dialog box will shown. Select the Trend/Regression type that you need. On to the Options tab select "Display equation on chart" and "Display R-squared value on chart.“ – The former will display the resulting best-fit equation on your chart – The latter will also include the R-squared value, allowing you to assess the goodness of the fit. Press OK to go back to your chart and see the resulting trendline. 3.2. MS Excel trendline feature The Add Trendline dialog box 3.2. MS Excel trendline feature The Add Trendline Options tab Various trendlines Conclusion In this lecture was described next questions: Using Microsoft Excel for the mathematic calculations. Statistical calculations in the Microsoft Excel. Curve Fitting Using Excel. Literature Electronic documentation on to the intranet server: http://miserver http://10.21.0.193