Download Statistic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Using Microsoft Excel for
the statistical calculations
Lections №4
Main Questions
 Using
Microsoft Excel for the
mathematic calculations.
 Statistical calculations in the
Microsoft Excel.
 Curve Fitting Using Excel
1.Mathematic calculations in
the Microsoft Excel
 Structure
of the Excel equation.
 Arguments of functions in the Excel
 Equation Wizard
1.1.Structure of the Excel equation


Simple equation example: =(А4+В8)*С6;
Composite equation example:
Equation
start symbol
reference to the cell
(relative)
Function with lists of the
arguments
Mathematic
operator
1.2. Arguments of functions
Constants – textual or numbering data;
Reference to the cell – address of cell (or cells) that
contain data for processing. There are two types of the
reference:
 relative – change when equation moved around table,
for example: F7;
 absolute – do not change when equation moved
around table :
 on to the cell, for example : $F$7;
 on to the table column, for example : $F7;
 on to the table row, for example : F$7;
1.2.1. Arrays as arguments


Array (range) – address of the cells are separated
by : (colon) – you must define address of the left
top and right bottom cells of the array. For
example: definition C4:C7 represented the array
with elements C4, C5, C6, C7;
Set (union) – address of the cells are separated by
; (semicolon) – you must define address of the each
cells of the array. For example: definition
D2:D4;D6:D8 – represented the array with
elements D2, D3, D4, D6, D7, D8.
1.3. Using the Equation Wizard
Run wizard – use command Insert-Function of
the main menu or click on Function icons on the
toolbar
 Step 1 – in dialog box select category of the
functions (Category list) and choose function
name in sub-list. Click ОК to finish;
 Step 2 – input arguments of the function
(constant or address of the cell). Different
function has different counts of the arguments ;
You can input data manual or click Choose button
and select input area on the Excel’s worksheet.

Step 1 : You can select category and function name
Using the Equation Wizard
Step 2 : You can input arguments of the function
2.Statistical calculations in the
Microsoft Excel
 Descriptive
statistics.
 Statistical hypothesis testing.
 Data Analysis add-on.
2.1.Descriptive statistics








Statistic - Measure of a sample characteristic.
Population - Contains all members of a group.
Sample - A subset of a population.
Interval Data - Objects classified by type or characteristic,
with logical order and equal differences between levels of
data.
Ordinal Data - Objects classified by type or characteristic with
some logical order.
Variable - A characteristic that can form different values from
one observation to another.
Independent Variable - A measure that can take on different
values which are subject to manipulation by the researcher.
Response Variable - The measure not controlled in an
experiment. Commonly known as the dependent variable.
2.1.1.Descriptive statistics
For interval level data, measures of central tendency
and variation are common descriptive statistics.
 Measures of central tendency describe a series of
data with a single attribute.
 Measures of variation describe how widely the data
elements vary.
 Standardized scores combine both central tendency
and variation into a single descriptor that is comparable
across different samples with the same or different units
of measurement.
For nominal/ordinal data, proportions are a common
method used to describe frequencies as they compare
to a total.
2.1.2.Descriptive statistics
2.1.3.Descriptive statistics






Mean - the arithmetic average of the scores in a
sample distribution.
Median - the point on a scale of measurement below
which fifty percent of the scores fall.
Mode - the most frequently occurring score in a
distribution.
Range - The difference between the highest and
lowest score (high-low).
Variance - The average of the squared deviations
between the individual scores and the mean. The
larger the variance the more variability there is
among the scores.
Standard deviation - The square root of variance. It
provides a representation of the variation among
scores that is directly comparable to the raw scores.
2.1.4.Descriptive statistics
2.1.5.Descriptive statistics
Statistical
parameter name
Excel function name
English ver.
Russian ver.
AVERAGE
СРЗНАЧ
Max
MAX
МАКС
Min
MIN
МИН
Variance
VAR
ДИСП
Standart deviation
STDEV
СТАНДОТКЛОН
Coef. of skewness
SKEWNEES
СКОС
KURT
ЭКСЦЕС
Mean
Coef. of kurtosis
2.2.Statistical Hypothesis Testing



The Normal Distribution. Although there are
numerous sampling distributions used in
hypothesis testing, the normal distribution is the
most common example of how data would appear
if we created a frequency histogram where the x
axis represents the values of scores in a
distribution and the y axis represents the
frequency of scores for each value.
Most scores will be similar and therefore will group
near the center of the distribution.
Some scores will have unusual values and will be
located far from the center or apex of the
distribution. .
2.2.1.The Normal Distribution
Properties of a normal distribution:
 Forms a symmetric bell-shaped curve
 50% of the scores lie above and 50% below the midpoint
of the distribution
 Curve is asymptotic to the x axis
 Mean, median, and mode are located at the midpoint of
the x axis
2.2.Statistical Hypothesis Testing
Hypothesis testing is used to establish whether
the differences exhibited by random samples
can be inferred to the populations from which
the samples originated.
Chain of reasoning for inferential statistics
 Sample(s) must be randomly selected
 Sample estimate is compared to
underlying distribution of the same size
sampling distribution
 Determine the probability that a sample
estimate reflects the population parameter
2.2.1.Statistical Hypothesis Testing

The four possible outcomes in hypothesis
testing:
Actual Population Comparison
Null Hyp. True
(there is no
difference)
Null Hyp. False
(there is a
difference)
Rejected Null
Hypothesis
Type I error
(alpha)
Correct Decision
Did not Reject
Null
Correct Decision
Type II Error
DECISION
2.2.2.Statistical Hypothesis Testing
When conducting statistical tests with computer software, the
exact probability of a Type I error is calculated. It is presented
in several formats but is most commonly reported as "p <" or
"Sig." or "Signif." or "Significance." The following table links
p values with a benchmark alpha of 0.05:
P < Alpha
Probability of Type I Error
0.05 0.05 5% chance difference is not
significant
0.10 0.05 10% chance difference is not
significant
0.01 0.05 1% chance difference is not
significant
0.96 0.05 96% chance difference is not
significant
Final Decision
Statistically signif.
Not statistically signif.
Statistically signif.
Not statistically signif.
2.2.3.Statistical Hypothesis Testing
General assumptions:
 Population is normally distributed
 Random sampling
 Mutually exclusive comparison samples
 Data characteristics match statistical
technique.
For interval / ratio data use: t-tests, Pearson
correlation, ANOVA, regression
For nominal / ordinal data use: Difference of
proportions, chi square and related
measures of association
2.2.4.Hypothesis Testing Testing
State the Hypothesis
 Null Hypothesis (Ho): There is no difference between
___ and ___.
 Alternative Hypothesis (Ha): There is a difference
between __ and __.
Rejection Criteria
 This determines how different the parameters and/or
statistics must be before the null hypothesis can be
rejected. This "region of rejection" is based on alpha
() - the error associated with the confidence level. The
point of rejection is known as the critical value.
 For the medical investigations use value  = 0,05
(5%).
Practical point of the
view
Statistical point of
the view
Comparing the control
and experimental
samples
Comparing Two
Independent Sample
Means
Additional conditions
Normal
distribution
Not
Normal
distribution
Comparing the sample
data before and after
experiment
Comparing a Sample
Mean to a constant
Comparing the
parameter diffusion in
two samples
Comparing Two
Dependent Sample
Means
Comparing a
Population Mean to a
Sample Mean
Comparing Two
Independent Sample
Variances
Appropritate method
Variances are
equal
T-test with homogeneity
of Variance
Variances are
not equal
T-test without
homogeneity of Variance
Without
variance test
T-test without variance
test
Variances are
equal
U-test (Willcocson Mann – Uitny)
Without
variance test
Median test
Normal distribution
T-test for the dependent
sample
Not Normal distribution
One sample signed test
(Willcocson)
Normal distribution
Comparing a constant to
a Sample Mean (T-test)
Not Normal distribution
Gupt signed test
Normal distribution
Computing F-ratio
Not Normal distribution
Zigel-Tiuky, Mozes tests
2.3.The Analysis ToolPak


Performing statistical analyses on sample data is
very convenient to do in Excel. It has dozens of
built-in spreadsheet functions that allow us to
perform all sorts of statistics calculations. The
Analysis ToolPak add-in also contains several
other statistical tools.
To make sure you have the Analysis ToolPak
add-in available in your version of Excel, select
Tools from the main menu bar and see if the Data
Analysis menu option appears toward the bottom
of the Tools menu. If not, select Tools - Add-Ins
from the main menu bar and select the Analysis
ToolPak option from the list.
2.3.1.The Analysis ToolPak
The Analysis ToolPak provides several tools for
conducting statistical tests. These tools include:
 F-Test Two-Sample for Variances
 t-Test Paired Two-Sample for Means
 t-Test Two-Sample Assuming Equal Variances
 t-Test Two-Sample Assuming Unequal Variances
 z-Test Two-Sample for Means
To access these tools, select Tools Data Analysis
from the main menu bar to open the Data Analysis
dialog box. You'll find each of the statistical test
tools listed in this dialog box.
MS EXCEL Add-ins dialog box
The Data Analysis ToolPak
Data Analysis dialog box
3. Curve Fitting Using Excel
 Understanding
Curve Fitting.
 MS Excel trendline feature.
3.1. Understanding Curve Fitting
Curve fitting is the process of trying to find
the curve (which is represented by some
model equation) that best represents the
sample data, or more specifically the
relationship between the independent and
dependent variables in the dataset.
 When the results of the curve fit are to be
used for making new predictions of the
dependent variable, this process is known as
regression analysis.

3.1. Understanding Curve Fitting

The Linear trendline uses the equation:
у = k • x + b,
– where k and b are parameters to be
determined during the curve-fitting process.

The Logarithmic trendline uses the equation:
у = с • ln(x) + b,
– where c and b are parameters to be
determined during the curve-fitting process.
3.1. Understanding Curve Fitting


The Power trendline uses the equation:
у = с • хb,
– where c and b are parameters to be
determined during the curve-fitting process.
The Exponential trendline uses the equation:
у = с • еb • х,
– where c and b are parameters to be
determined during the curve-fitting process.
3.1. Understanding Curve Fitting

The Polynomial trendlines use the equation:
у = b + с1 х + с2 х2 + с3 х3 + с4 х4 + с5 х5 +с6 х6
– where the c-coefficients and b are
parameters of the curve fit. Excel supports
polynomial fits up to sixth order.
3.2. MS Excel trendline feature

The 5 listed before curve fits are easily
generated using the trendline feature built into
Excel's XY scatter chart.

Once you've plotted your data using an XY
scatter chart, you can generate a trendline
that will be displayed on your chart,
superimposed over your data.

You can also include the resulting equation
for the best-fit line on your chart.
3.2. MS Excel trendline feature
To use a trendline feature in the Excel chart:

Create chart, that based on your data samples (recommended
use an XY scatter or linear chart type).

Right-click on the data series and select Add Trendline from
the pop-up menu. The Add Trendline dialog box will shown.

Select the Trend/Regression type that you need. On to the
Options tab select "Display equation on chart" and "Display
R-squared value on chart.“
– The former will display the resulting best-fit equation on your
chart
– The latter will also include the R-squared value, allowing
you to assess the goodness of the fit.

Press OK to go back to your chart and see the resulting
trendline.
3.2. MS Excel trendline feature
The Add
Trendline
dialog box
3.2. MS Excel trendline feature
The Add
Trendline
Options
tab
Various trendlines
Conclusion
In this lecture was described next questions:
 Using Microsoft Excel for the
mathematic calculations.
 Statistical calculations in the Microsoft
Excel.
 Curve Fitting Using Excel.
Literature

Electronic documentation on to the
intranet server:
http://miserver
http://10.21.0.193