Download 03. User Study Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
User Study Evaluation
Human-Computer Interaction
Hypothesis
• A statement of prediction
• Describes what you expect will happen in your study
• Alternative hypothesis (H1) – your prediction, i.e. a
claim of difference in the population
• e.g. Participants will commit more errors with interface
A than with interface B
• Null hypothesis (H0) – No difference or no effect
• e.g. Participants will commit the same number of errors
between interface A and interface B or Participants will
commit more errors in interface B than with interface A
Hypothesis – one or two tailed?
• Alternative hypothesis
• One-tailed: Participants will commit more errors with
interface A than with interface B (i.e. directional)
• Two-tailed: There will be a significant difference in the
number of errors participants commit with interface A
than with interface B
• but I don’t know if there will be more or fewer (i.e. non-
directional)
• Can’t prove the alternative hypothesis, can only reject
the null hypothesis
• If your prediction was correct – reject null hypothesis
• Not rejecting null hypothesis ≠ accepting it
Metrics
• What you are measuring
• Some types of metrics
• Objective – facts of an event
• Time to complete task (continuous)
• Errors (discrete, i.e. distinct and separate, can be
counted)
• Subjective – a person’s opinion
• Satisfaction
Metrics
• Types of metrics
• Objective – facts of an event
• Subjective – a person’s opinion
• *Both* are important
• How to measure
• Instrumentation – record data within your system
• Questionnaires / Surveys
• Scales
• Free-response
• Let’s discuss appropriateness of each
• Let’s look at a very popular survey (SUS)
Analysis
•Most of what we do involves:
•Normal Distributed Results
•Independent Testing
•Homogenous Population
•Recall, we are testing the hypothesis by trying to
prove the NULL hypothesis false
Analysis
• 3 main steps for analysis
• Data Preparation: Cleaning and organizing the data for
analysis
• Checking the data for accuracy
• Transforming data (e.g. reverse coding survey data)
• Descriptive Statistics: Describing the data
• Provide simple summaries about the sample and the measures
• Simply describing what is, what the data shows
• Inferential Statistics: Testing Hypotheses and Models
• Try to infer from the sample data what the population thinks
• Make judgments of the probability that an observed difference
between groups is a dependable one or one that might have
happened by chance
Data preparation
• Checking data for accuracy
• Are the responses legible/readable?
• Are all important questions answered?
• Are the responses complete?
• Is all relevant contextual information included (e.g.,
data, time, place, researcher)?
Data preparation
• Data transformations
• Missing values
• Depending on program, need designate specific values to
represent missing values, e.g. -99
• Scale totals
• Add or average across individual items
• Item reversals
• Likert scale – sometimes rating for items need to be
•
•
•
•
reversed
1 (strongly disagree) – 5 (strongly agree)
“I generally feel good about myself.”
“Sometimes I feel like I'm not worth much as a person.”
What does a 5 mean in each case?
Descriptive statistics
• Simple summaries of sample and measures, i.e. data
• Describing what is or what the data shows
• Central tendency – estimate of the “center” of a
distribution of values
• Mean – average across a set of values
• 15, 15, 18, 25, 33 = 106
• µ = 106/5 = 21.2
• Median – score found in middle of a set of values
• 15, 15, 18, 25, 33
• Mode – most frequently occurring value
• 15, 15, 18, 25, 33
• Describe the data with a number and a graph
Inferential statistics
• Try to reach conclusions that go beyond the
immediate data – draw inferences
• e.g. want to compare the average performance of 2
groups to see if there’s a difference
t-test: statistical
test used to
determine whether
two observed
means are
statistically
different
t-test
• What does it mean to say that the averages for two
groups are statistically different?
t-test
• Variability is the noise that
may make it harder to see
the group difference
• Variance: measure of
variability around the mean
• Standard deviation:
square root of the variance
t – test
• (rule of thumb) Good values of t > 1.96 (standard
deviations from the mean)
t-test
• Once computed, look up t-value to see whether the ratio is large
enough to say that the difference between the groups is not
likely to have been a chance finding.
• To test the significance, you need to set a risk level (called
the alpha level). Accepted standard is alpha level of .05.
• 5 times out of 100 you would find a statistically significant difference
between the means even if there was none (i.e., by "chance").
• Degrees of freedom (df). For t-test, the df = sum of the persons
in both groups minus 2.
• Given the alpha level, the df, and the t-value, look up t-value to
determine whether the t-value is large enough to be significant.
• If yes, conclude that difference between means for the 2 groups
is different (even given the variability) and reject null hypothesis.
α and p values
• α value – probability of making a Type I error
(rejecting null hypothesis when really true)
• p value – probability that the effect found did not
occur by chance. The lower the p value, the
higher the statistical significance (the more
rigorous the test)
Relationship between α and p
values
• Once the alpha level has been set, a statistic (like t) is
computed.
• Each statistic has an associated probability value called a p-
value, or the likelihood of an observed statistic occurring due to
chance, given the sampling distribution.
• Alpha sets the standard for how extreme the data must be
before we can reject the null hypothesis. The p-value indicates
how extreme the data are.
• Compare the p-value with alpha to determine whether the
observed data are statistically significantly different from the
null hypothesis
Kinds of t-tests
Formula is slightly different for each:
• Single-sample:
• tests whether a sample mean is significantly different from
a pre-existing value (e.g. norms)
• Paired-samples:
• tests the relationship between 2 linked samples, e.g.
means obtained in 2 conditions by a single group of
participants
• Independent-samples:
• tests the relationship between 2 independent populations
• Which test fits your situation?
t and alpha values
Independent samples t-test
• Example: social presence questionnaire
• “I perceived I was in the presence of a patient in the room
with me.”
• http://www.vassarstats.net/tu.html
Correlations
Correlations – relationship between two variables
Pearon’s product-moment correlation coefficient – r
http://bdaugherty.tripod.com/KeySkills/lineGraphs.html
Correlations
Pearson’s product-moment
correlation coefficient – r
http://www.socscistatistics.co
m/tests/pearson/Default2.asp
x
http://en.wikipedia.org/wiki/Co
rrelation_and_dependence