Download Statistics - Chandigarh University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
PPT ON BUSINESS STATISTICS
CLASS-BBA
SECTION-3
SUBJECT CODE-BBT 201
Statistics
What is statistics?
• a branch of mathematics that provides
techniques to analyze whether or not your data is
significant (meaningful)
• Statistical applications are based on probability
statements
• Nothing is “proved” with statistics
• Statistics are reported
• Statistics report the probability that similar
results would occur if you repeated the
experiment
Statistics deals with numbers
• Need to know nature of numbers collected
– Continuous variables: type of numbers associated
with measuring or weighing; any value in a
continuous interval of measurement.
• Examples:
– Weight of students, height of plants, time to flowering
– Discrete variables: type of numbers that are
counted or categorical
• Examples:
– Numbers of boys, girls, insects, plants
Sample Populations avoiding Bias
• Individuals in a sample population
– Must be a fair representation of the entire pop.
– Therefore sample members must be randomly
selected (to avoid bias)
– Example: if you were looking at strength in
students: picking students from the football team
would NOT be random
Statistical Computations (the Math)
• If you are using a sample population
– Arithmetic Mean (average)
The sum of all the scores
divided by the total number of scores.
– The mean shows that ½ the members of the pop
fall on either side of an estimated value: mean
http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
Mode and Median
• Mode: most frequently seen value (if no
numbers repeat then the mode = 0)
• Median: the middle number
– If you have an odd number of data then the
median is the value in the middle of the set
– If you have an even number of data then the
median is the average between the two middle
values in the set.
Variance (s2)
• Mathematically expressing the degree of
variation of scores (data) from the mean
• A large variance means that the individual
scores (data) of the sample deviate a lot from
the mean.
• A small variance indicates the scores (data)
deviate little from the mean
Calculating the variance for a whole population
Σ = sum of; X = score, value,
µ = mean, N= total of scores or values
OR use the VAR function in Excel
Calculating the variance for a Biased SAMPLE population
Σ = sum of; X = score, value,
n -1 = total of scores or values-1
(often read as “x bar”) is the mean (average value of xi).
Note the sample variance is larger…why?
http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Standard Deviation
• An important statistic that is also used to
measure variation in biased samples.
• S is the symbol for standard deviation
• Calculated by taking the square root of the
variance
Time Series Analysis and
Forecasting
Introduction to Time Series Analysis
• A time-series is a set of observations on a quantitative
variable collected over time.
• Examples
– Dow Jones Industrial Averages
– Historical data on sales, inventory, customer counts,
interest rates, costs, etc
• Businesses are often very interested in forecasting time
series variables.
• Often, independent variables are not available to build a
regression model of a time series variable.
• In time series analysis, we analyze the past behavior of a
variable in order to predict its future behavior.
Methods used in Forecasting
• Regression Analysis
• Time Series Analysis (TSA)
– A statistical technique that uses timeseries data for explaining the past or
forecasting future events.
– The prediction is a function of time
(days, months, years, etc.)
– No causal variable; examine past behavior
of a variable and and attempt to predict
future behavior
Components of TSA (Cont.)
• Cycle
– An up-and-down repetitive movement in demand.
– repeats itself over a long period of time
• Seasonal Variation
– An up-and-down repetitive movement within a trend
occurring periodically.
– Often weather related but could be daily or weekly
occurrence
• Random Variations
– Erratic movements that are not predictable because they
do not follow a pattern
Time Series Plot
Actual Sales
$3,000
Sales (in $1,000s)
$2,500
$2,000
$1,500
$1,000
$500
$0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21
Time Period
Components of TSA (Cont.)
• Difficult to forecast demand because...
– There are no causal variables
– The components (trend, seasonality,
cycles, and random variation) cannot
always be easily or accurately
identified
Moving Averages
Yt  Yt-1  Yt- k +1

Yt 1 
k

No general method exists for determining k.

We must try out several k values to see what works best.
INDEX NUMBERS
An index number is a statistical value that
measures the change in a variable with
respect to time
• Two variables that are often considered in this
analysis are price and quantity
• With the aid of index numbers, the average
price of several articles in one year may be
compared with the average price of the same
quantity of the same articles in a number of
different years
• We will examine index numbers that are
constructed from a single item only
• Such indexes are called simple index numbers
• Current period = the period for which you
wish to find the index number
• Base period = the period with which you wish
to compare prices in the current period
• The choice of the base period should be
considered very carefully
• The notation we shall use is:
– pn = the price of an item in the current period
– po = the price of an item in the base period
• Price relative
– The price relative of an item is the ratio of the
price of the item in the current period to the price
of the same item in the base perIod
• Simple aggregate index (cont…)
– Even though the simple aggregate index is easy to
calculate, it has serious disadvantages:
1. An item with a relatively large price can dominate the index
2. If prices are quoted for different quantities, the simple aggregate
index will yield a different answer
3. It does not take into account the quantity of each item sold
– Disadvantage 2 is perhaps the worst feature of
this index, since it makes it possible, to a certain
extent, to manipulate the value of the index
Weighted index numbers
• The use of a weighted index number or weighted index allows greater
importance to be attached to some items
• Information other than simply the change in price over time can then be
used, and can include such factors as quantity sold or quantity consumed
for each item
• Laspeyres index
– The Laspeyres index is also known as the average
of weighted relative prices
– In this case, the weights used are the quantities of
each item bought in the base period
CONSUMER PRICE INDEX
•
The measure most commonly used in Australia as a general indicator of the
rate of price change for consumer goods and services is the consumer price
index
•
The Indian CPI assumes the purchase of a constant ‘basket’ of goods and
services and measures price changes in that basket alone
•
The description of the CPI commonly adopted by users is in terms of its
perceived uses; hence there are frequent references to the CPI as
– a measure of inflation
– a measure of changes in purchasing power, or
– a measure of changes in the cost of living
Introduction to Probability Theory
• Experiment: toss a coin twice
• Sample space: possible outcomes of an experiment
– S = {HH, HT, TH, TT}
• Event: a subset of possible outcomes
– A={HH}, B={HT, TH}
• Probability of an event : an number assigned to an event
Pr(A)
– Axiom 1: Pr(A)  0
– Axiom 2: Pr(S) = 1
– Axiom 3: For every sequence of disjoint events
– Example: Pr(A) = n(A)/N: frequentist statistics
• Consider the experiment of tossing a coin twice
• Example I:
– A = {HT, HH}, B = {HT}
– Will event A independent from event B?
• Example II:
– A = {HT}, B = {TH}
– Will event A independent from event B?
• Disjoint  Independence
• If A is independent from B, B is independent from C, will A be
independent from C?
BAYES THEOREM
Pr( AB) Pr( A | B) Pr( B)
Pr( B | A) 

Pr( A)
Pr( A)
RANDOM VARIABLE
• A random variable X is a numerical outcome of a
random experiment
• The distribution of a random variable is the
collection of possible outcomes along with their
probabilities:
– Discrete case
– Continuous case:
• The outcome of an experiment can either be
success (i.e., 1) and failure (i.e., 0).
• Pr(X=1) = p, Pr(X=0) = 1-p, or
• E[X] = p, Var(X)
Simple Linear Regression
and Correlation
17.31
Linear Regression Analysis…
• Regression analysis is used to predict the
value of one variable (the dependent
variable) on the basis of other variables (the
independent variables).
• Dependent variable: denoted Y
• Independent variables: denoted X1, X2, …,
Xk
• If we only have ONE independent variable, the model
is
17.32
Correlation Analysis… “-1 <  < 1”
• If we are interested only in determining
whether a relationship exists, we employ
correlation analysis. Example: Student’s
height and weight.
Plot of Height vs Weight
Plot of Height vs Weight
7
7
6.6
6.2
Height
Height
6.6
5.8
5.4
6.2
5.8
5
4.6
100
140
180
220
5.4
260
100
140
Weight
180
220
260
Weight
Plot of Height vs Weight
Plot of Height vs Weight
6.8
6.6
6.2
6.2
Height
Height
6.5
5.9
5.6
5.8
5.4
5.3
100
140
180
Weight
220
260
5
100
140
180
220
260
Weight
17.33
Correlation Analysis… “-1 <  < 1”
• If the correlation coefficient is close to +1 that
means you have a strong positive relationship.
• If the correlation coefficient is close to -1 that
means you have a strong negative
relationship.
• If the correlation coefficient is close to 0 that
means you have no correlation.
17.34