Download Lab_2 - Courseworks

Advanced Quantitative Techniques Lab 2: Normality, Graphing Distributions, Confidence Intervals Normal distribution What are the Characteristics of a Normal Distribution? • Unimodal • Bell shaped • Symmetric • Mean = Mode = Median • Skewness = 0 • Kurtosis = 3 • 68 – 95 – 99.7 rule If population has a Normal distribution 68.2% of dataset is within 1 standard deviation of the mean 95.4% of dataset is within 2 standard deviations of the mean 99.7% of dataset is within 3 standard deviations of the mean More about Normal distribution • Probability of any event is the area under the density curve. • Total area under curve = 1 (collectively exhaustive) • Normal distributions are idealized description of data • Total area is approximate; never precisely calculated because the line never touches x-axis. Is population normal distributed? 0 100 Frequency 200 300 use calls_311.dta histogram POP2010, width (600) frequency normal 0 10000 20000 POP2010 30000 Is population normal distributed? sum POP2010, detail Variance vs. Standard Deviation Variance (σ2) Standard Deviation (σ) 1 2 xi     n 1 2 xi     n Average of squared differences from the mean Square root of the variance Skewness Skewness is a measure of symmetry Where is the tail? Mean > Median Skewness > 0 Mean = Median STATA: Skewness = 0 Mean < Median Skewness < 0 Skewness Kurtosis • Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. (Kurtosis > 3) (Kurtosis = 3) (Kurtosis < 3) Example of Normal distribution • use Lab_2_Data.16.dta • histogram bwt, width (400) frequency normal Example of Normal distribution • sum bwt, detail Sampling • Population – a group that includes all the cases (individuals, objects, or groups) in which the researcher is interested. • Sample – a relatively small subset from a population. Sampling • Random sample • Stratified sample: divide the population into groups and draw a random sample from each group • Cluster sample: group the population into small clusters, draws a simple random sample of clusters, and sample everything in the clusters Sampling • Parameter – A measure used to describe a population distribution. • Statistic – A measure used to describe a sample distribution. • Estimation – A process whereby we select a random sample from a population and use a sample statistic to estimate a population parameter. Inference Inferential Statistics • We generally don’t know anything about the population distribution • We have a sample of data from the population • We assume that the average/mean is the most appropriate description of population (no more median because we assume normal distribution) • The sample is to be random and representative (“large enough”) Inferential Statistics What can we infer about the population based on a sample? • From now on, we’re estimating the population mean (μ) with the sample mean (x). • We are no longer talking about individual behavior; we’re talking about average behavior  Distribution of Means • Take a random sample over, and over, and over again (random means each data point has an equal chance of being chosen). • You get many sample means x1 , x2 , x3 , x4 , x5 ,..., x • Plot the sampling distribution of these means: you get a distribution of averages (not raw data points!) Distribution of Means • Sampling Distribution of Means: Frequency distribution (histogram) of the sample means, not of the data themselves. Freq Distribution of all possible sample means x **This is not the distribution of x** • If we sample randomly from a large enough population, the distribution of the averages of the data (not the population data!) is a bell curve (normal distribution). • This is the case regardless of what the population distribution looks like. Confidence Intervals • The goal of calculating confidence intervals is to determine how sure we are that the true population mean, μ, is approximated by the sample mean x. Confidence Intervals • Confidence Level – The likelihood, expressed as a percentage or a probability, that a specified interval will contain the population parameter. – 95% confidence level – there is a .95 probability that a specified interval DOES contain the population mean. – 99% confidence level – there is 1 chance out of 100 that the interval DOES NOT contain the population mean. STATA: ci Command • Open Stata and calls_311.dta . Ci means calls_per_thousand, level(90) Significance Level Sample Size Sample Mean . ci calls_per_thousand, level(90) Variable Obs Mean calls_per_~d 2168 1.534331 Standard Error = s n Std. Err. [90% Conf. Interval] .0335816 1.479071 1.589592 Lower Bound of the CI Upper Bound of the CI Build a 95% CI for 311 calls per thousand people. The default CI for the CI command in Stata is 95%. . ci calls_per_thousand, level(90) Variable Obs Mean calls_per_~d 2168 1.534331 Std. Err. [90% Conf. Interval] .0335816 1.479071 Std. Err. [95% Conf. Interval] .0335816 1.468476 1.589592 Precise . ci calls_per_thousand Variable Obs Mean calls_per_~d 2168 1.534331 1.600187 . ci calls_per_thousand, level(99) Variable Obs Mean calls_per_~d 2168 1.534331 Confident Std. Err. [99% Conf. Interval] .0335816 1.447755 1.620908 Build a CI for Bronx calls/1,000pps that leaves a 10% chance of overestimation error. ci means calls_per_thousand if county=="005", level(80) . ci calls_per_thousand if county=="005", level(80) Variable Obs Mean calls_per_~d 339 1.66264 Std. Err. [80% Conf. Interval] .0937592 1.542247 1.783032 Build a CI for Manhattan calls/1,000pps that leaves a 20% chance that the population mean is not captured by the interval. ci means calls_per_thousand if county=="061", level(80) . ci calls_per_thousand if county=="061", level(80) Variable Obs Mean calls_per_~d 288 2.018764 Std. Err. [80% Conf. Interval] .1377721 1.841794 2.195733 Are they significantly different? Confidence intervals in a Normal distribution

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lab_2 - Courseworks