Download Summary of basic definitions (including Homework 1)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Summary of Initial Class Material
8/28/06
Key concepts from Chapter 1:
Statistics is the art and science of designing studies and analyzing the data that those
studies produce. Its ultimate goal is translating data into knowledge and understanding of
the world around us. In short, statistics is the art and science of learning from data.
Design: Planning how to obtain data to answer the question of interest.
Description: Summarizing the data that are obtained
Inference: Making decisions and predictions based on the data
Example: In the exit poll about the Gray Davis recall, 54% of a sample of 3160 voters
said they had voted to recall Davis. Stating that the sample fraction was 54% is an
example of descriptive statistics. However, if you used that to make the prediction that
Governer Davis was going to lose the recall, that would be an example of inference.
Populations and samples:
The population is the total set of subjects in which we are interested. The sample is a
subset of that population on whom we have (or plan to have) data.
Example:
Figures released Wednesday by the National Association of Realtors show a continued
implosion in the housing market, with inventories up sharply as prices are softening.
Sales were down 11.4 percent in the past year, to a seasonally adjusted annual rate of
6.33 million, compared with 6.60 million in June.
Was this a population or a sample?
Descriptive statistics and inferential statistics:
Descriptive statistics refers to methods of summarizing the data, usually either as
numerical values (e.g. averages and percentages) or through graphs.
Inferential statistics refers to methods of making a decision or prediction about a
population, based on data from a sample of that population.
Parameters and statistics:
A parameter is some numerical summary of an entire population (e.g. the proportion of
all voters in California who voted for Governor Davis’ recall).
A statistic is a numerical summary based on a sample (e.g. the proportion of voters in an
exit poll who stated they had voted for the recall)
Probability is also critical for developing statistical inference methods. For example, if
the recall vote were actually supported by less than half the voters, what is the
probability that there would be 54% in favor among a sample of 3160 voters?
Types of Data (Section 2.1)
Variables – categorical, quantitative
If quantitative, may be discrete or continuous.
A frequency table is a list of all the possible values of a variable, together with the
number of cases in which each value occurs. The proportion of the observations
associated with a given category is the count of observations in that category, divided by
the total number of observations. If the proportion is multiplied by 100, it is called a
percentage. Proportions and percentages are also called relative frequencies.
Questions for Homework 1
1.22. In a University of Wisconsin (UW) study about alcohol abuse among students, 100
of the 40,858 members of the student body in Madison were sampled and asked to
complete a questionnaire. One question was, “On how many days in the past week did
you consume at least one alcoholic drink?”
a. Identify the population and the sample.
b. For the 40,858 students at UW, one characteristic of interest was the number of
students who would reply “zero” to this question. For the 100 students sampled, supposed
29% gave this response. Does this mean that 29% of the entire population of UW
students would make this response? Explain.
c. Is the numerical summary of 29% a sample statistic or a population parameter?
1.31. Pick up a recent issue of a national newspaper such as The New York Times or
USA Today. Identify an article that used statistical methods. Did it use descriptive
statistics, or inferential statistics, or both? Explain. (Note: as already explained in the
introductory handout, I am willing to accept local as well as national newspapers, or webbased news such as www.nytimes.com or www.cnn.com.)
2.6. Identify each of the following variables as continuous or discrete.
a. The length of time to run a marathon.
b. The number of people in line at a box office to purchase theater tickets.
c. The weight of a dog.
d. The number of people you have dated in the past month.
2.8. In the 2002 General Social Survey, respondents answered the question, “How many
children have you ever had?” Results were:
a. Is the variable, number of children, categorical or quantitative?
b. Is the variable, number of children, discrete or continuous?
c. Add proportions and percentages to this frequency table.
d. Which response if the mode?
Number of Children
Count
0
1
2
3
4
5
6
7
799
469
657
481
185
73
40
22
8 or more
34