Download Lecture 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Lecture 1 An Overview of the Course Concepts via a Matlab Example
Problem 6-127 (on p.237 of the book) The force needed to remove the cap from a medicine bottle is an important feature
of the product because requiring too much force may cause difficulty for elderly patients or patients with arthritis or
similar conditions. Table 6E.17 presents the results of testing a sample of 68 caps attached to bottles for the force (in lbs)
required for removing the cap.
Table 6E.17 Force Required to Remove Bottle Caps (n=68)
x1=[14 18 27 24 24 28 22 21 16];
x2=[17 22 16 16 18 30 16 14 15];
x3=[25 15 16 15 15 19 19 10 22];
x4=[17 15 17 20 17 20 15 17 20];
x5=[24 27 17 32 31 27 21 21 26];
x6=[31 34 32 24 16 37 36 34 20];
x7=[19 21 14 14 19 15 30 24 15];
x8=[17 17 21 34 24];
x=[x1 x2 x3 x4 x5 x6 x7 x8];
(a)Construct a stem-and-leaf diagram of the force data. [See Figure 6-4 for an example]
Q: What is the value of the diagram?
A: It shows how the data are grouped throughout specified stems.
Q: Why do we care?
A: Because we want to better understand the distribution of the force, F.
Q: Can we get a similar plot by using Matlab ‘hist’ ?
A: Sure. Simply specify the bin centers to be at multiples of 10.
>> bctr=10:10:30;
>> h=hist(x,bctr);
>> stem(bctr,h)
>> grid
Q: What does this tell us about F?
A: That it is more likely to be in the 20s than the 10s or 30s.
(b)What are the average and the standard deviation of the force? >> mean(x) = 21.2647 >> std(x) = 6.4220
Q: What does this tell us about F?
A: The TRUE mean is somewhere around 21. So, on the average, we would expect the required force to be ~21#.
A: The TRUE standard deviation is somewhere around 6. So, if we think of a, say,  3 range of force, then it would go
from ~3# to 40#. [Not good!]
(c)Construct a normal probability plot of the data and comment on the plot.
>> normplot(x)
Q: What does this tell us?
A: That at high and especially at low values of F, the assumption that it
comes from a normal distribution is not supported.
(d)If the upper specification on the required force is 30 lbs, what proportion of the caps do not meet this requirement?
(e)What proportion of the caps exceeds the average force plus 2 standard deviations? 9/68
Q: What does this tell us about F?
A: That the probability that it will exceed the 30# requirement is ~9/68.
(f)Suppose the first 36 observations come from one machine and the remaining come from a second machine (read across
the rows and down). Does there seem to be a possible difference in the two machines? Construct an appropriate graphical
display of the data as part of your answer.
From (e) it is reasonable to speculate that the machines differ. All exceedances were related to machine #2. I would use
the above color display.
(g)Plot the first 36 observations in the table on a normal probability plot and the remaining observations on another
normal probability plot. Compare the results with the single normal probability plot that you constructed for all the data in
part (c).
>> xM1=x(1:36);
>> xM2=x(37:68);
>> normplot(xM1)
>> normplot(xM2)
The lower values of F relate more to machine #1 & the higher values relate to machine #2.
Q: But how does the comparison work re: normality?
A: In my opinion, neither machine works well for the normal model.
Comments re: This Course
The above analysis related to the random variable F was a very cursory one. It gave some general insight, but nothing that
would be conclusive enough to propose recommendations that might cost $$$. It is an analysis that could have easily been
carried out by a senior high school student taking an introductory course in descriptive statistics. Good to gain an
appreciation of how one can begin to investigate properties of a data set. But that’s about it.
We will now proceed to carry out some analyses that demonstrate how this course will proceed. The ultimate goal is to
add sufficient rigor such that the engineer in charge of the project could arrive at more valuable
conclusions/recommendations.
MATLAB CODE
%PROGRAM NAME: Problem6_127.m
%=======================================================================
%PROBLEM STATEMENT: To investigate the force needed to remove a bottle cap.
%R.V X=The act of measuring the force needed to remove a bottle cap.
%I will assume the Sample Space for X is: SX={1,2, ... , 50}[LBf]
%Define the data collection R.V.s {X(k)} k=1:63
%Assumption (A1): Each X(k) has the same probability distribution as X.
%Assumption (A2): Each X(j) is statistically independent of X(k) for j/=k.
%========================================================================
% Raw data by row:
x1=[14 18 27 24 24 28 22 21 16];
x2=[17 22 16 16 18 30 16 14 15];
x3=[25 15 16 15 15 19 19 10 22];
x4=[17 15 17 20 17 20 15 17 20];
x5=[24 27 17 32 31 27 21 21 26];
x6=[31 34 32 24 16 37 36 34 20];
x7=[19 21 14 14 19 15 30 24 15];
x8=[17 17 21 34 24];
x=[x1 x2 x3 x4 x5 x6 x7 x8]; %Data associated with [X(1), ... , X(68)].
n=length(x); %This is the size of the data array.
%========================================================================
%Task 1: Estimate the mean E(X)=muX & std. dev.sqrt{E[(X-muX)^2]}.
muXhat=mean(x)
stdXhat=std(x)
pause
%----------------------------------------%Task 2: Investigate the distribution of X.
figure(1)
hist(x)
grid
xlabel('Force (LB)')
ylabel('Frequency')
title('Histogram of Measured Force')
%OBSERVATIONS:
%(O1): The sample distribution is heavily skewed (i.e. not symmetric)
%(O2): The histogram is ambiguous. It suggests that SX is continuous,
%
when, in fact, it is discrete.
pause
%---------------------------------------%Task 3: Construct an unambiguous and SCALED histogram to arrive at an
%
estimate of the probability distribution.
SX=1:50; %This is the assumed SX.
figure(2)
h=hist(x,SX); %This is the array of heights associated with each value.
fXhat=h/n; %This is an estimate of the discrete distribution [Its sum=1.]
stem(SX,fXhat)
grid
xlabel('Force x (LB)')
ylabel('f_Xhat(x)')
title('Estimate of f_X(x)=Pr[X=x]')
%OBSERVATIONS:
%(O3): We now have an unambiguous description for fX(x).
%(O4): It is 'erratic' due to the fact that we used only 63 measurements
pause
%--------------------------------------%Task 4: Address quality control.
%SUPPOSE that the event a bottle is accepted is [15<=X<=25].
%Estimate the probability of this event.
count=0;
PrACCEPThat=sum(h(15:25))/n
pause
%=======================================================================
%QUESTION: We found that PrACCEPThat=0.6825. How trustworthy is this?
%
Surely, had we had more data, it would be more trustworthy.
%ANSWER 1: Go back and collect more data, & see if the # is close to ours.
%
This is logical, but can be $$$.
%-----------------------------------------%ANSWER 2: Assume a model distribution for X, and run simulations.
%
This relies on the validity of the chosen model, but is CHEAP.
%-----------------------------------------%Task 5: Look at a model distribution.
%***We will ASSUME a Poisson model with TRUE parameter p=muXhat***:
PX=poisspdf(SX,muXhat);
hold on
stem(SX,PX,'*r')
title('Data-based estimate of f_X(x)=Pr[X=x] (blue) & Poisson Model (red)')
%OBSERVATION: It might appear that the model is a poor choice.
pause
%-----------------------------------------%Task 6: Compare simulations to the data-based distribution.
nsim = 1000; %Run 1000 simulations
xsim=poissrnd(muXhat,n,nsim); %This is the nxnsim simulation array.
for k=1:5
%View first 5 simulations
k
figure(3)
stem(SX,fXhat)
grid
xlabel('Force x (LB)')
ylabel('f_Xhat(x)')
Phat=hist(xsim(:,k),SX)/n;
hold on
stem(SX,Phat,'*r')
hold off
pause
end
%OBSERVATION: The simulated data distributions are 'generally' similar.
%Task 6: Compute nsim estimates of PrACCEPT and study their uncertainty.
PPhatACCEPT=zeros(1,nsim);
for k=1:nsim
pp=hist(xsim(:,k),SX)/n;
PPhatACCEPT(k)=sum(pp(15:25));
end
muPaccept=mean(PPhatACCEPT) %Estimated mean of PhatACCEPT
stdPaccept=std(PPhatACCEPT) %Estimated std of PhatACCEPT
[hp,bb]=hist(PPhatACCEPT,25);
db=bb(2)-bb(1);
fPPhatACCEPT=hp/(nsim*db); %Estimated distribution of PhatACCEPT
figure(6)
bar(bb,fPPhatACCEPT)
title('Distribution of ACCEPT probability estimate')
grid
hold on
plot(PrACCEPThat,0,'ro','LineWidth',3)
%OBSERVATIONS:
%The real data based estimate of PrACCEPT falls well within the pdf.
%The pdf for PahtACCEPT appears to have a bell-shape (i.e. normal) pdf.
%There are many gaps in the pdf. WHY?