Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Topics Semester I   Descriptive statistics Time series Semester II    Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models Sampling Statistical observations Expectations:    Quickness Accuracy Reliability Solutions   Observe each individuals Sampling Statistical inference Descriptive statistics: Describe the observed elements Statistical inference: Inferences to the populations which are based on the sample.   Estimation, hypothesis testing Estimation: Estimate the population parameter from a sample Types   Point Interval Error types Type of Errors Sampling error: due to selecting a sample instead of the entire population Nonsampling error: errors due to mistakes Issues Probability vs. Nonprobability samples Sample size Representativity Probability versus Nonprobability Probability Samples: each member of the population has a known non-zero probability of being selected  Methods include random sampling, sampling, and stratified sampling. systematic Nonprobability Samples: members are selected from the population in some nonrandom manner  Methods include convenience sampling, judgment sampling, quota sampling, and snowball sampling Random Sampling Random sampling is the purest form of probability sampling. Simple Random sample with replacement: Each member of the population has an equal and known chance of being selected. Simple Random sample without replacement Stratified Sampling Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. A stratum is a subset of the population that share at least one common characteristic; such as males and females.  Identify relevant stratums and their actual representation in the population.  Random sampling is then used to select a sufficient number of subjects from each stratum.  Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums. Cluster Sampling Cluster Sample: a probability sample in which each sampling unit is a collection of elements. Effective under the following conditions:   A good sampling frame is not available or costly, while a frame listing clusters is easily obtained The cost of obtaining observations increases as the distance separating the elements increases Examples of clusters:     City blocks – political or geographical Housing units – college students Hospitals – illnesses Automobile – set of four tires We examine Sample Size Mean x N m Std. Dev. Proprtion s* P s P - n Population Distribution of variables and parameters Relationship between variables Point estimation Point estimation The statistic is computed from sample to estimate the population parameter Consistence ˆ) E(  Estimation of population mean Can the sample mean be a potential estimation? Yes, if ? E( x )  m Example Population: Mean (m): 10, 11, 12, 13, 14 12 Variance (s2): 2 Std. Dev. (s): 1,4142136 Size (N): 5 Sample size (n): 2 Consider each sample with sample size 2 Describe the distribution of the sample means! Calculate the expected value of the sample means! Sample distribution: distribution of the examined parameter. What is the result? The expected value of the sample means with given sample size is equal to the population mean E( x )  m n x fx i i i 1 n Point estimation of pop. STD. DEV. n f i xi  x    s  2 i 1 n E (s )  s 2 2 Corrected empirical std. Dev. n s   xi  x  n 2 i 1 n 1   di  n 2 i 1 n 1 s   i 1 f i xi  x  E (s 2 )  s 2 n 2 n 1   i 1 f i d i  2 n 1 Point estimation of proportion k p n E ( p)   With replacement sp  pq n Without replacement sp  pq N  n  n N 1 Standard Error of the estimation The difference on average between the sample statistics and the population parameter with given sample size In the case of the sample means: he standard error of the estimation The difference on average between the sample means and the population mean. Standard error of the mean sx Calculation With replacement sx  s n Without replacement sx   n s N n N 1 What is happened if s is unknown? Estimation E (s 2 )  s 2 s s s x  sx With replacement sx  s n s sx  n Without replacement sx  s n N n  N 1 s n sx   1 N n INTERVAL ESTIMATE OF THE POPULATION MEAN Structure of the confidence interval • 95%s interval: from 100 estimates on average 95 contain the population mean • First step? x Point estimation  : max imum allowable error (error bound ) Maximum error: with a given probability the maximum error of the estimation !!!! Maximum error: with a given probability the maximum error of the estimation Standard error of the estimate: the average error of the estimation. How can we calculate the maximum error? Start from  See: exp ected value  k  std .dev. k depends on the probabilit y 2. In the case of sample means: exp ected value  k  std.dev. k depends on the probabilit y x  k  s x if s is known x  x  k  s x if s is un known How can we calculate the value of k (1) it depends on the probability What do we know about the distribution of the sample means? Distribution of sample means Size of sample Distribution Small Same as the popupulation Large (n>100) Normal distribution (Central limit theorem) About the normal distribution X~N(E(X),s2) Special case E(X)=0,s2=1 transform into standard Normal distribution z~N(0,1) F(x)=F(z) F(-z)=1-F(z) F (z ) If x is a variable z is a Standardized variable   Mean of z:0 Std. Dev of z:1 z XE ( X ) s  X m s m  s  x  m  s  1  z  1 m  2s  x  m  2s  2  z  2 m  3s  x  m  3s  3  z  3 Apply for sample means z x E ( x ) sx  x m sx m  s x  x  m  s x  1  z  1 m  2s x  x  m  2s x  2  z  2 m  3s x  x  m  3s x  3  z  3 Calculation of value of k (2) 1-a given Pld. kz 1 xz a 1 2 a 2 sx Calculation of value of k (3) We should know the std. Dev of the population (s) In the real life we know nothing about it 1. Instead of s we use s 2. Instead of normal distribution we use t-distribution! t-distribution ifn  , then t xt a 1 2 a 1 2 (v ) s x ( v)  z 1 a 2 v  n  1 Summary x Examine  Std dev. s or s?  Small or large sample? In case of small samples can we assume the normal distribution?  Type of sample (EV/FAE/R)? Alapsokaság szórása ismert x  k s x  x   x  z a s x 1 2 Alapsokaság szórása nem ismert x  k  sx  x   x  t a (v )  s x 1 2 v  n 1 Plan the sample size In real life the maximum error is given in advance. In this case what about the sample size?   z a sx  z 1 2 1 a 2  z a s   1 2  s n   n      2 Proportional stratified sample s known xz 1 sx  as x s unknown x t 2 sB n s a x 1 2 M sx  n s j1 n 2 j j Estimate of proportion pz if min( np ; nq )  10 a 1 2 With replacement Without replacement sp  pq n sp  pq N  n  n N 1 sp Estimate of standard deviation ( n  1) s ( n  1) s 2  s  2 2  a ( )  a ( ) 2 1 2 2   n 1 2