Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A focus on Sampling and Sampling Methods Menu Sampling Methods Definitions Measures of Centre Assessment Tips Measures of Spread Practice Tasks On Your Calculator For clarification, click on any step you do not understand to see that element broken down The example used throughout this presentation is trying to find the mean height of WBHS pupils Sampling Methods In this presentation you will see a number of sampling methods, their benefits and drawbacks. Simple Random Sample Cluster Sampling Systematic Sampling Stratified Sampling Note: For more detailed instructions on any of the example click on the step you misunderstand Measures of Central Tendency In this presentation you will learn how to calculate a number of measures of average or centre, as well as their benefits and drawbacks Mean Median Mode Note: For more detailed instructions in any of the examples click on the step you misunderstand Measures of Spread In this presentation you will learn how to find a number of measures of spread as well as their drawbacks and advantages. You will also need to decide which measure of spread and which measure of centre go together. Standard Deviation Interquartile Range Range Note: For more detailed instructions in any of the examples click on the step you misunderstand Simple Random Sample The simplest unbiased sample. 1- Number the entire population. 2- Generate random numbers. 3- Proceed until you have as many as you need ignoring any repeats. Example (Heights of WBHS students) 1. Get a copy of the School Roll. 2. Number every person 3. Generate Random numbers from 1 to the maximum you need. 4. Proceed until you have the desired sample size ignoring repeats. Simple Random Sample Advantages Cheap Easy to carry out Unbiased Disadvantages May not represent strata Needs an entire population list Cluster Sampling 1. 2. 3. The easiest unbiased sample. Sort your data into clusters based on location. Randomly choose the cluster. Perform a simple random sample on the chosen cluster. Example (Heights of WBHS students) 1. Get a copy of the School Roll. 2. Sort into clusters eg year levels 3. Randomly select the cluster. 4. Randomly generate a sample from each cluster. Care with clusters as Juniors are much shorter than Seniors Cluster Sampling Advantages Very Cheap Very Easy to carry out Unbiased Disadvantages Needs an entire population list Can be biased if clusters strongly affect the statistics. Systematic Sampling 1. 2. 3. 4. A relatively quick way to pick an unbiased sample List the entire population. Decide on your step size (Total ÷ Sample size = n). Randomly generate a starting point. Step every nth data point till you have your sample. Example (Heights of WBHS students) 1. Get an alphabetical copy of the School Roll. 2. Step Size = Total ÷ Sample size 3. Randomly generate a starting point. 4. Starting from the beginning use the step size to pick the rest of the sample Systematic Sampling Advantages Cheap Easy to Choose Sample Unbiased Disadvantages Needs an entire population list If population list is ordered then sample can become biased Stratified Sampling 1. 2. 3. The most reliable sampling method. Sort the data into strata based on information you already know. Calculate the proportions for each strata. Perform a Simple Random Sample on each of the strata. Example (Heights of WBHS students) 1. Get a copy of the School Roll separated into year levels. 2. Calculate the sample size for each year group (strata). 3. Perform a simple random sample on each year group to their specific sample size. Stratified Sampling Advantages Unbiased Completely representative of each of the strata Most reliable estimates Disadvantages Needs entire population list Information about entire population needs to be known beforehand Time consuming Generate a Random Number 1. 2. 3. Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) Choose your calculator Casio Casio FX-82 Graphic Texas Random Number on a Casio Graphics Calculator 1. 2. 3. Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) In Run Mode Intg(529 × Ran# + 1) Population size or Strata size Starting Value F4 ( ) F6 OPTN Intg OPTN – F6 – F4 – F5 Ran# OPTN – F6 – F3 – F4 On Screen Intg(529 × Ran# + 1) F3 7 8 5 1 × + Random Number on a Casio FX - 82 Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) Ran# = 2nd function · On screen 1. 2. 3. 4. RAN#×529+1 shift Ran# × 529 + 1 = Population size or strata size note Starting value Ignore any decimal in the answer · Random Number on a Texas Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) 1. 2. RANDI PRB → , 3. RANDI 2nd Function ) On Screen RANDI(1 , 529) Starting Value End Value RANDI(1,529) 2nd PRB ) Simple Random Sample 1. 2. 3. The simplest unbiased sample. Number the entire population. Generate random numbers. Proceed until you have as many as you need ignoring any repeats. Example (Heights of WBHS students) 1. Get a copy of the School Roll. 2. Number every person from 1 (to 529) 3. Generate Random numbers from 1 to the maximum you need (529). 4. Proceed until you have the desired sample size ignoring repeats. Strata Proportions 1. 2. Number of people in strata divided by total in population. Multiplied by number of people wanted in total sample. Example (Heights of WBHS students) 1. 529 people on School Roll. 2. 115 year 10’s 3. Sample size of 30 4. So year 10 sample size 115 ÷ 529 × 30 = 6.52 So take 7 year 10 students Systematic Step Sizes 1. Number of people in population divided by Sample Size Example (Heights of WBHS students) 1. 529 people on School Roll. 2. Sample size of 30 3. So Step size 529 ÷ 30 = 17.63333 So take every 17th student from the starting position Systematic Stepping 1. Starting at the random start point step out till you get desired sample size. Example (Heights of WBHS students) 1. Random starting point 803, step size 29 2. 803rd student on alphabetical list is where we start. 3. Then 832nd student, 861st student, we have now reached the end of the roll so start at the beginning 890= 15th student then 45th student… Mean 1. 2. Add up all of the values in the sample. Divide by the sample size. Calculator Method Advantages Disadvantages Easy to calculate for large samples. Affected by outliers Accurate and well understood Median 1. 2. List all the values in order. Find the central value Advantages Disadvantages Accurate Not so widely known as an average Not affected much by Outliers Time consuming to list large sample in order Mode 1. 2. List all the values Find the most common item Advantages Disadvantages Can calculate mode for data that is not numeric or ordered Can be inaccurate for numeric or data that can be ordered Not affected much by Outliers Very easy to calculate Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas Statistics on a Casio Graphics Calculator 1. 2. 3. 4. 5. 6. 7. In Stat Mode In list 1 enter all data values In list 2 enter their frequencies F2 (CALC) F6 (SET) Should read Exit F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) S.D. using table 1Var XList 1Var Freq 2Var XList 2Var YList 2Var Freq F1 :List1 :List2 :List3 :List4 :List5 F2 F6 EXIT Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE 1 2 3 4 5 List 1 List 2 List 3 List4 Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 EXE Statistics on a Casio FX 82 Calculator Put your calculator into statistics mode 1. • Mode 2 Clear the statistics memory 2. • Shift Scl mode clr all 1 2 3 shift Mode 1 Shown on Screen Enter the data carefully 3. M+ • 180cm M+ Calculate desired statistics 4. • Shift 1. 2. S.D. using table 2 χ mean χσn standard deviation mode Entering Data on Casio FX 82 Calculator n= 1 Enter each data value followed by M+ ‘n’ is the number of data values that you have entered M+ Note Be very careful entering the data values as you cannot review them later to make sure that they are correct. Statistics on a Texas Calculator 1. Put your calculator into statistics mode 1. 2. 2. DATA n x Sx σx Enter the data carefully 1. 3. 2nd Function 1 - VAR DATA 2nd Calculate desired statistics 1. 2. STATVAR Shift between statistics with arrow keys 1. 2. 3. S.D. using table n χ σχ number of data values mean standard deviation DATA STATVAR Entering Data on a Texas Calculator Press the Data Key to begin Begin entering data. X1 is the data value Followed by the down arrow X1 = 180 2nd DATA Freq1 is that data values frequency Followed by the down arrow X2 is next then Freq2 To check data use up arrow Definitions • Population • Census Sample Parameters • • The entire list of those people or things that you wish to sample A survey of an entire population A small group of a population Facts about an entire population gained from a census (Notation: mean ‘μ’ or standard deviation ‘σ’) • Statistics Estimates of population parameters calculated from a sample (Notation: mean ‘χ’ or standard deviation ‘s’) • Representative • Bias A sample that appears to represent all elements of the in the correct proportions population A sampling method that does not give every element of the population an equal chance of selection Standard Deviation • • This is a calculation of the average difference between the data values and the mean. This measure of spread applies to the mean. Use Calculator to Calculate Use table to calculate Advantages Disadvantages Easy to calculate for large samples on calculator. Affected by outliers Accurate Very useful for certain types of data Possibly not so well understood Interquartile Range 1. 2. 3. Calculate the upper and lower quartiles. Upper quartile minus lower quartile. This measure of spread applies to the median Advantages Disadvantages Well understood Easy to calculate for large samples. Unaffected by outliers Range Find the highest and lowest value. 2. Highest value minus the lowest value. 3. This measure of spread applies to all measures of centre. Advantages Disadvantages 1. Well understood Unaffected by outliers Easy to calculate for large samples. Standard Deviation by Table Mean Calculated as usual, doesn’t change Data Values From your sample or census χ χ Data values minus the Mean χ–χ (χ – χ)2 180 165 15 225 150 165 -15 225 165 165 0 0 170 165 5 25 160 165 -5 25 0 500 Total 825 Mean 165 100 Use Calculator to Calculate Square of each of the values to the left Final Standard Deviation is the square root of this value so s = 10 Calculating Quartiles 1. 2. 3. 4. 5. List all the values in order. Find the central value Discard that central value Find the central value of the remaining two halves. These 2 numbers are the upper and lower quartiles Example (Heights of WBHS students) 1. Data Values 165, 170, 173, 180, 182, 183, 191, 192 2. Central value middle of 180 and 182 so median is 181 3. Discard 181 and calculate middle of each half. 4. 165, 170, 173, 180//182, 183, 191, 192 Lower quartile 171 Upper quartile 187 Things to Consider Is my sample representative of the population? • Need to consider whether any strata present in the data are represented in approximately the correct proportions. • Need to consider the presence of any apparent outliers in the sample chosen, and the effect they will have on estimates of population parameters. Things to Consider Is my sample representative of the population? • Estimates are more reliable when taken from a large sample as the effects of outliers are lessened. • Consider the size of the s.d. A larger value of s suggests considerable variation in the data values. Thus taking another sample could produce quite different statistics. • Ask yourself, “If I were to repeat this sampling process, would I get the same results?” Things to Consider How could I improve my sampling method? • Need to choose a sampling method which eliminates bias, and which gives the best chance of choosing a representative sample. (Bias exists when some of the population members have greater or lesser chance of being included in the sample.) • Need to discuss which statistics would give the best estimates of population parameters, including the effect of outliers. Things to Consider Would I get the same or similar results if I repeated the same process? • Are there outliers or extreme values that may affect the sample statistics? If so then I probably wouldn’t get similar results. • Is the standard deviation (or measure of spread) large when compared to the mean, if it is then repeating the same results is unlikely. Things to Consider When answering question or stating conclusions; • Answers need to be precise and refer to actual data values present in the sample and/or population. • Strata must be clearly defined. • Answers cannot be vague or rote-learnt without referring specifically to the context of the assessment. • Students must be very clear that the sample statistics are ESTIMATES of the population parameters. • They must NOT state that the population mean is … unless they have taken a census of the whole population! Practice Tasks Real Estate Stats On Your Calculator In this part of the presentation you can check on exactly how to use your calculator effectively to help with Statistics Generating Random Numbers Entering Data Calculating Statistics Note: For more detailed instructions on any of the example click on the step you misunderstand Entering Data on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas Statistics on a Casio Graphics Calculator 1. 2. 3. 4. 5. 6. 7. In Stat Mode In list 1 enter all data values In list 2 enter their frequencies F2 (CALC) F6 (SET) Should read Exit F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) S.D. using table 1Var XList 1Var Freq 2Var XList 2Var YList 2Var Freq F1 :List1 :List2 :List3 :List4 :List5 F2 F6 EXIT Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE 1 2 3 4 5 List 1 List 2 List 3 List4 Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 EXE Statistics on a Casio FX 82 Calculator Put your calculator into statistics mode 1. • Mode 2 Clear the statistics memory 2. • Shift Scl mode clr all 1 2 3 shift Mode 1 Shown on Screen Enter the data carefully 3. M+ • 180cm M+ Calculate desired statistics 4. • Shift 1. 2. S.D. using table 2 χ mean χσn standard deviation mode Entering Data on Casio FX 82 Calculator n= 1 Enter each data value followed by M+ ‘n’ is the number of data values that you have entered M+ Note Be very careful entering the data values as you cannot review them later to make sure that they are correct. Statistics on a Texas Calculator 1. Put your calculator into statistics mode 1. 2. 2. DATA n x Sx σx Enter the data carefully 1. 3. 2nd Function 1 - VAR DATA 2nd Calculate desired statistics 1. 2. STATVAR Shift between statistics with arrow keys 1. 2. 3. S.D. using table n χ σχ number of data values mean standard deviation DATA STATVAR Entering Data on a Texas Calculator Press the Data Key to begin Begin entering data. X1 is the data value Followed by the down arrow X1 = 180 2nd DATA Freq1 is that data values frequency Followed by the down arrow X2 is next then Freq2 To check data use up arrow Generate a Random Number 1. 2. 3. Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) Choose your calculator Casio Casio FX-82 Graphic Texas Random Number on a Casio Graphics Calculator 1. 2. 3. Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) In Run Mode Intg(529 × Ran# + 1) Population size or Strata size Starting Value F4 ( ) F6 OPTN Intg OPTN – F6 – F4 – F5 Ran# OPTN – F6 – F3 – F4 On Screen Intg(529 × Ran# + 1) F3 7 8 5 1 × + Random Number on a Casio FX - 82 Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) Ran# = 2nd function · On screen 1. 2. 3. 4. RAN#×529+1 shift Ran# × 529 + 1 = Population size or strata size note Starting value Ignore any decimal in the answer · Random Number on a Texas Decide on the starting number (in this case 1) Decide how many you need (In the case of the school 529 students) 1. 2. RANDI PRB → , 3. RANDI 2nd Function ) On Screen RANDI(1 , 529) Starting Value End Value RANDI(1,529) 2nd PRB )