Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Activity (continued): are Man U really the best? Background In an association football (henceforth, football) match, two teams of 11 players attempt to kick a ball into their opponents’ goal (a frame with a net covering the top, back and sides) while preventing their opponents from kicking the same ball into theirs. It is common over a 90 minute match for just a couple of goals to be scored (sometimes none at all) so the outcome is very unpredictable: weak teams can beat good teams on the day. In most competitions, there is a league in which points are awarded for winning or drawing games, which filters out the unpredictability. After the end of a season of games, the team at the top of the league is usually crowned champions. We generally think that the team that finishes first must be better than the team that finishes second, and definitely better than the team that finishes last. But is this true? Objectives The main objective of this activity is to consolidate what you learned in the race to beat Ruth by applying it, unguided, to a different but related task. In particular, you will have to write a function to simulate games or leagues, and use logical statements or loops to determine how many points each team earns and which team comes top. Task Write an R script that simulates an hypothetical football season. You might model this on the English Premier League, the S League, La Liga, or the league of any other country of your choosing (but don’t choose one with a complicated structure like the Scottish Premiership). You’ll need to find out the rules of the league, for instance, is it 3 points for a win, 1 for a draw and 0 for a loss? Does each pair of teams play each other twice, once home and once away? How many teams are there? To begin with, you should assume each team has the same ability as each other. Find out the average number of goals scored at home and away in your league (it is easier for a team to score goals at home as the fans, disproportionately of the home side, urge them on; you should be able to find data online for the popular leagues). You might model the number of goals scored by each team as being drawn from a Poisson distribution with that mean. A few Poisson distributions are plotted in box 2. We recommend you also make the simplifying assumption that the number of goals scored by each team does not depend on the number scored by the other team. To simulate the league placements of each team, you will need to develop a script that: • Simulates the number of goals scored by team A at home against team B away, and vice versa, for each pair of teams. • Determine the winner of each game. • Determine the points each team takes from each game. • Tally the points. • Order the teams by points. Once you have simulated the final points for each team, make an histogram or other plot of the simulated number of points and the actual number of points each team got in a recent season. Any difference between these is due to: • The assumptions in the model do not fully represent reality; • Chance; • The teams differing in ability. Probably the latter is the most important and is responsible for the discrepancy. Box 2: Poisson distribution The Poisson distribution is named after a Frenchman, not a fish. If the random variable called 𝑋 has a Poisson distribution, then its distribution is characterised by its mean. Call this 𝜇. We write 𝑋~𝑃𝑃(𝜇) to indicate that X is Poisson with mean 𝜇. If so, 𝑋 can take any non-negative integer value, i.e. it is possible that 𝑋 = 0, or 1, or 2, or … The probability that the random variable 𝑋 is equal to a specific value, say 𝑘, given its mean 𝜇, is 𝜇 𝑘 𝑒 −𝜇 Pr(𝑋 = 𝑘|𝜇) = 𝑘! where the exclamation mark is the factorial function (i.e. 𝑘! = 𝑘 × 𝑘 − 1 × ⋯ × 2 × 1, e.g. 3! = 3 × 2 × 1 = 6). The Poisson distribution arises when independent events occur in space or time. Examples of data that are plausibly Poisson are the number of lung cancer diagnoses per month, or the number of heart attacks over a year in each township in Singapore. Some examples of the Poisson distribution are presented below: In R you may find the following commands useful: rpois(n,mu) # simulates n random variates from a Poisson distribution # with mean mu dpois(k,mu) # calculates the probability that a random variable which # is Poisson with mean mu would be equal to k Once you have done this when all teams are the same, allow the teams to differ in ability. One way to do this is to allow the typical number of goals scored to be different for each team. For example, if before you had 1.6 goals scored per team at home (on average), you might at the beginning of the season assign to each team its own typical number of goals, say by simulating this from a normal distribution with mean 1.6 and standard deviation 0.1 (say) or from a uniform distribution from 1.4 to 1.8 (say). If you have a single measure of ability, you can then rank the teams in ability, which allows you to compare the ranking in ability versus the ranking in final league placing. How does the spread of final points change as you change the standard deviation (for a normal) or spread (for a uniform distribution)? Can you get close to the actual data from your league? Use this to assess how often the best team comes top! Did the best team come top? How often will the best team come top? Every season? One season in two? To answer this, you want to estimate the probability the best team will be top of the league. You can do this by simulating lots of leagues, storing which team came top, and then checking whether it was actually the best team according to the ability ranking.