Download Activity (continued): are Man U really the best?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Activity (continued): are Man U really the best?
Background
In an association football (henceforth, football) match, two teams of 11 players attempt to kick a ball into their
opponents’ goal (a frame with a net covering the top, back and sides) while preventing their opponents from kicking
the same ball into theirs. It is common over a 90 minute match for just a couple of goals to be scored (sometimes
none at all) so the outcome is very unpredictable: weak teams can beat good teams on the day. In most competitions,
there is a league in which points are awarded for winning or drawing games, which filters out the unpredictability.
After the end of a season of games, the team at the top of the league is usually crowned champions. We generally
think that the team that finishes first must be better than the team that finishes second, and definitely better than the
team that finishes last. But is this true?
Objectives
The main objective of this activity is to consolidate what you learned in the race to beat Ruth by applying it, unguided,
to a different but related task. In particular, you will have to write a function to simulate games or leagues, and use
logical statements or loops to determine how many points each team earns and which team comes top.
Task
Write an R script that simulates an hypothetical football season. You might model this on the English Premier League,
the S League, La Liga, or the league of any other country of your choosing (but don’t choose one with a complicated
structure like the Scottish Premiership). You’ll need to find out the rules of the league, for instance, is it 3 points for a
win, 1 for a draw and 0 for a loss? Does each pair of teams play each other twice, once home and once away? How
many teams are there?
To begin with, you should assume each team has the same ability as each other. Find out the average number of goals
scored at home and away in your league (it is easier for a team to score goals at home as the fans, disproportionately
of the home side, urge them on; you should be able to find data online for the popular leagues). You might model the
number of goals scored by each team as being drawn from a Poisson distribution with that mean. A few Poisson
distributions are plotted in box 2. We recommend you also make the simplifying assumption that the number of goals
scored by each team does not depend on the number scored by the other team.
To simulate the league placements of each team, you will need to develop a script that:
• Simulates the number of goals scored by team A at home against team B away, and vice versa, for each pair of
teams.
• Determine the winner of each game.
• Determine the points each team takes from each game.
• Tally the points.
• Order the teams by points.
Once you have simulated the final points for each team, make an histogram or other plot of the simulated number of
points and the actual number of points each team got in a recent season. Any difference between these is due to:
• The assumptions in the model do not fully represent reality;
• Chance;
• The teams differing in ability.
Probably the latter is the most important and is responsible for the discrepancy.
Box 2: Poisson distribution
The Poisson distribution is named after a Frenchman, not a fish. If the random variable called 𝑋 has a Poisson
distribution, then its distribution is characterised by its mean. Call this 𝜇. We write 𝑋~𝑃𝑃(𝜇) to indicate that X is
Poisson with mean 𝜇. If so, 𝑋 can take any non-negative integer value, i.e. it is possible that 𝑋 = 0, or 1, or 2, or … The
probability that the random variable 𝑋 is equal to a specific value, say 𝑘, given its mean 𝜇, is
𝜇 𝑘 𝑒 −𝜇
Pr(𝑋 = 𝑘|𝜇) =
𝑘!
where the exclamation mark is the factorial function (i.e. 𝑘! = 𝑘 × 𝑘 − 1 × ⋯ × 2 × 1, e.g. 3! = 3 × 2 × 1 = 6). The
Poisson distribution arises when independent events occur in space or time. Examples of data that are plausibly
Poisson are the number of lung cancer diagnoses per month, or the number of heart attacks over a year in each
township in Singapore. Some examples of the Poisson distribution are presented below:
In R you may find the following commands useful:
rpois(n,mu) # simulates n random variates from a Poisson distribution
# with mean mu
dpois(k,mu) # calculates the probability that a random variable which
# is Poisson with mean mu would be equal to k
Once you have done this when all teams are the same, allow the teams to differ in ability. One way to do this is to
allow the typical number of goals scored to be different for each team. For example, if before you had 1.6 goals scored
per team at home (on average), you might at the beginning of the season assign to each team its own typical number
of goals, say by simulating this from a normal distribution with mean 1.6 and standard deviation 0.1 (say) or from a
uniform distribution from 1.4 to 1.8 (say). If you have a single measure of ability, you can then rank the teams in
ability, which allows you to compare the ranking in ability versus the ranking in final league placing. How does the
spread of final points change as you change the standard deviation (for a normal) or spread (for a uniform
distribution)? Can you get close to the actual data from your league?
Use this to assess how often the best team comes top! Did the best team come top? How often will the best team
come top? Every season? One season in two? To answer this, you want to estimate the probability the best team will
be top of the league. You can do this by simulating lots of leagues, storing which team came top, and then checking
whether it was actually the best team according to the ability ranking.