Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Optional Lecture 1 Welcome back to Introduction to International Relations. This is the first optional lecture for this module. That means that there won't be any test questions drawn directly from this material, nor will any of the activities be based on it. However, you may find future lectures easier to understand if you listen to this one. And, as broad a topic as probability is, I'm only going to cover a few things in this lecture, all of which will come up again. So keep listening. Specifically, as you can see on the first slide, there are three things I'll go over. First, I'm going to introduce some basic terms and concepts. Second, I'm going to discuss expected values. There are actually a lot of ways to make use of that idea, but the main reason I'm going to walk you through it is that expected values play a critical role in some of the game-theoretic models you'll see later in this module. Finally, I'm going to present Bayes' Theorem, which also has a lot of applications beyond those you'll see here, which will again mostly be game-theoretic. On the second slide, you see two important terms, which you've actually already heard me use. (At least, you will have if you've listened to the first two lectures, as you should have done.) The first is “variable”, which is an alphabetic character, Greek letter, or word that represents numeric values that differ across observations. Oberservations, incidentally, which I haven't bothered to define on the slide, are the basic units of a data set. Things we've observed. So, for example, if I were to ask all of you how tall you are, which I have (check your email), then each one of you (who responds) will comprise a separate observation in my data set, and height will be one of the variables. In other words, a variable is a thing that varies; it takes on different values for different people, or countries, or whatever it is we're analysing. Variables are sometimes mixed up with constants. As the name implies, the latter do not vary. They are....constant. But they are generally represented the same way. So you won't know that they're constants just by looking at the notation used. You might have to go into the data set, though in many cases a little common sense will suffice. Turn now to the third slide. You see there an example of a data set. The very first row tells us the names of the variables (or constants). Each row after that is a unique observation. So while there are twenty-one rows in the spreadsheet, there are only twenty observations in the data set. To help clarify that, I've actually included a variable that simply records the observation number. As you can see, I've named that variable “obs” for short. Naturally, I've put it in the first column. (If you're not familiar with spreadsheets and tables, rows run across; columns up and down.) That's not a variable we'd do anything with, but sometimes it's helpful to have identifying variables in our data sets. In the second column, we have “k”. That's common notation for a constant. How do we know it's a constant? Well, because it doesn't vary. See how that works? In all twenty of the obversations, the numeric value k takes on is the same. (It doesn't matter that that value is 1. Even if it was 17.4983, or whatever, k would be a constant. What matters is that it doesn't take on different values for different observations. Which, as I said in the first lecture, means we can't use it to explain anything that does vary across observations. Which is generally what we're after.) In the next three columns, you see proper variables. I've named two of them x (the first x1 and the second x2, cleverly enough) and one y. As I discuss in the nect lecture, x is commonly used for independent variables and y dependent variables. (More on what those are next time. For now, just note that variables are sometimes named for what they measure, and sometimes for the role they serve.) The actual values of these variables are irrelevant. I generated them randomly. I'll show you other spreadsheets in the future where the variables will actually measure things we care about. For now, I just wanted you to see what a data set looks like so that the definitions I gave on the previous slide would make more sense. So that's it for terminology. Variables vary, constants don't. The name (or letter or symbol) used may or may not tell us whether we're dealing with a variable or a constant. There's certain conventions – k for constant, x and y for variables – but they're not always followed. What really differentiates the two is that if we go into our data set, we'll see different numerical values recorded for at least some of the observations (they don't all have to be different) for our variables, but the same exact value in every case for constants. And that means the latter are of no use to us. On the fourth slide, I discuss expected value. Like I said earlier, this is actually useful in a lot of ways. I'll give you some examples soon. But let's tough our way through the technicalities, with all that notation, first. It's really not as bad as it looks. So we've got a random variable x. (A random variable is a one whose value is determined partly by chance. That isn't necessarily true of other types of variables.) There are N different things that can occur. Each of them does so with its own probability. (I use the subscript here because there may be more than one probability we need to keep track of, depending on how big N is. If that's confusing, it'll be clearer soon.) And each has it's own numerical value. (Again, if that's too abstract, just be patient.) So the expected value of x, written as capital E with x in parentheses, is equal to the sum of p-i times z-i. That funny looking symbol that also looks a bit like an E (or the upper-case sigma, for those of you that know your Greek letters) is the summation sign. It tells us to perform the same operation again and again, starting from the first possibility and ending with the last, then adding everything up when we're done. In this case, what we're repeating is multiplying probabilities by the values associated with different outcomes. Nerds like me write it that way because it saves space. But if you find that intimidating or confuse, you can do it the long way. E of x then would be p-1 times z-1 plus p-2 times z-2, and so on, until you get to p-N times z-N. Still too abtract? Okay, turn to the next slide, where you've got a practical example. Suppose you're trying to figure out how you're going to do in this module. For whatever reason (I've just chosen numbers arbitrarily so you have something other than p's and z's to work with), you expect to earn 80 percent of the points available from the in-class activities. And you expect to get a 70 on your short essay. For some of you, that might be a bit optimistic, while for others it'll be pessimistic. The actual numbers don't matter. Feel free to hit pause and play around with different ones after I finish going through this example. It would be good practice. So we're treating those as constants (to keep things simple). But we're going to let your performance on the take-home tests vary. Let's assume, again just by way of example, that you figure there's a 10% chance you'll do pretty well and score an average of 80 on them, a 10% chance you'll do poorly and average 65, but most likely you'll end up somewhere in the middle. Specifically, we're guessing there's a 40% chance each of you averaging 70 and 75 on the two tests. As it says in the syllabus, the activites are worth 25%, the short essay 15%, and the two take-home tests 30% each. Which means that if we're plugging in your average test score, for simplicity, we're going to count it for 60%. (Obviously you could use individual test scores, and even give different probabilities for the outcomes if you think you'll do better on the second test. That would just add a couple more steps to the calculation.) So, if you average 65 on the tests, your grade for the module would be a 69.5. (That's 0.6 times 65, which is 39, and 0.25 times 80, which is 20, and 0.15 times 70, which 10.5, added together.) We can do the same thing for each of the other possibilities, and you can see the grades that would result there on the slide. So now we calculate your expected grade. We've got four z's – that's what your grade for the module would be under each of the possibilities we discussed in terms of your performance on the tests – and four associated p's. As the previous slide tells us, we now need to multiply the z's by the p's and add all that all together (not unlike how we came up with the final grades). As you can see on the last bullet point, that means you would expect to finish the module with a 74. You might do better, and you might do worse, but that would be your best guess, given the assumptions we made. Which you're welcome to change if you think those weren't generous enough. Or if you're honest enough to admit that it might have been too generous. Okay, there were a lot of moving parts there. Hopefully, you were able to follow along anyway, but in case anyone would find this helpful, I'll do a couple really simple examples before we move on to our last topic for this lecture. Suppose we agree to bet on a coin flip. You guess that it will come up heads. I promise that if it does, I'll pay you ten quid. But if it doesn't, I won't pay you anything. (Nor will I ask you to pay me anything. Because one of us has a real job, and I'd have to be a real jerk to take your money.) How much do you expect to make? Here, x is how much I pay you, p-i the probability of each possible outcome of the coin flip, and z-i the amounts you win at each outcome. Of course, we only have two possibilities, so we don't even need to bother with a bunch of different p's. Whenever there are two possible outcomes, we only have to specify one p, because something has to happen, and that means that if the first one doesn't, the second one will. Meaning that the probability of the second outcome is one minus the probability of the first. In the case of a coin flip, assuming it's not weighted, and whoever flips it doesn't cheat somehow, the probability it comes up heads is 0.5, and the probability it comes up tails is 1 minus 0.5, which is of course 0.5 again. So we multiply 10 (what you win if it comes up heads) by 0.5 (how likely it is to do so) and add to that 0 (what you win if it comes up tails) multiplied by 0.5 (how likely that is to happen), and get 5 quid. (Because half of ten is five, and zero times anything is zero, and zero added to five is five.) It may sound strange to say that you expect to win 5 quid, because there's precisely zero percent chance of that happening – you'll either get ten or nothing – but that's how expected value works. Which probably makes you think it's a stupid concept. It's not, though. You just have to know how to interpret it. That 5 we came up with isn't so much a prediction about what will happen as it is a benchmark that helps us figure out whether you've got a good bet on your hands. Let's try another example. Your friend is absolutely convinced the Conservative party will win the next general election in the UK, because he or she thinks Jeremy Corbyn is too far from centre. You figure it's true that Corbyn puts the party at a disadvantage, but not as big as your friend is making it sound. Your friend is so confident, though, that he or she is willing to offer you fifty pounds if Labour wins, while only expecting you to pay ten if the Tories do. Should you take the bet? I'm no expert on British politics, but I'd say you probably should. The best (not easiest, but best) way to think about this is to calculate your expected value for the wager. To avoid making any strong assumptions, I'm not even going to assign actual probabilities to the outcomes. Just figure out what the smallest probability of a Labour victory is for which your expected value would still be positive. That requires some algebra, though. (If you need a refresher, give the next optional lecture a listen.) Let p be the probability of a Labour victory. Your expected value for this bet is p times 50 plus one minus p times negative 10 (because you'd lose money in that case). What we want to know is how big p has to be for that whole expression to be greater than 0. So that's 50p plus one minus p, in parentheses, times minus 10, set greater than 0. That simplifies to 50p minus 10 plus 10p greater than 0, or 60p greater than 10, or p greater than one-sixth. So even if you think Labour is more likely to lose than not, you only need to think the odds of a Labour victory are better than 1 in 6 (which is a pretty low bar) for it to be a good idea to take the bet. The moral of that story is that your friend is an idiot. Actually, the point is, we use expected values to make better decisions, not as literal descriptions of what's going to happen. People use them for gambling, investing in stocks, and so on. We'll use them in this module to gain a better understanding of international relations. Really. That might sound absurd now, but I hope to change your mind after a few more lectures. Okay, that brings us to our last topic: Bayes' Theorem. Which is named after...some dead white guy none of us care about. I'm being flippant, but I bet that's actually true, so let's get straight to the theorem. If you do actually care, look him up. You have my permission to use Wikipedai. Am I allowed to say that? Bayes' Theorem is used to define conditional probabilities, which means it tells us how likely one thing to be true given that some other (presumably relevant) thing is known to be true. In other words, anytime you receive a piece of information that could help you make a more educated guess about something else, you can (and probably should, though only super-geeks like me do) apply Bayes' Theorem. (I joke that no one actually uses it, which really isn't true, but even if it was, that wouldn't mean it's a waste of time to learn Bayes' Theorem. We have good evidence that people typically behave as if they're using it, to a certain extent. So depending on whether you're trying to get a sense of general patterns of behavior, which is all we're after in this module, or trying to make precise predictions, which I don't advise, it can be very useful to develop theories where people are assumed to incorproate new information properly, in accordance with Bayes' Theorem. The analogy some people use here, which I find helpful, is that dogs can't do advanced mathematics, but are pretty good at catching frisbees. No one thinks that means physics equations don't accurately describe the way objects move through space, though. Similarly, I don't actually think world leaders go through the sorts of calculations envisioned by the theoretical models you'll see in this module. But that doesn't mean we can't use mathematical models to understand the world better.) So, the theorem tells us that the probabiltiy of A being true, given that we know B is true, is equal to the probability that we'd observe B if A was true, weighted by our initial estimate of the probability of A being true, divided by a bunch of junk that amounts to the total probability of observing B. Yeah, I know, it's a bit of an eyesore. But it makes sense if you step back from all the notation and think about it informally. I want to know if something's true, and I've just gotten a clue that's no smoking gun, but might be helpful. So I ask myself, how often would that piece of evidence exist if I was right, and how often would it exist if I was wrong? And of course I need to take into account how likely I was to be right in the first place. I'm going to take you through two examples, one in detail and one a bit more loosely. One will show you exactly how the theorem works, the other why it matters. There was this old game show in the US, Let's Make a Deal, and there's a famous puzzle based on it called the Monty Hall Problem. There are three doors. Behind one is a new car, while the others only hide goats. (Why goats? I don't know. Monty Hall was a strange dude.) The contestant picks a door. Monty opens one of the other two and reveals a goat. Note, Monty knows where the car is, and he never opens that door. That may seem obvious – the show would lose a lot of money if the host often gave away the answer by mistake – but it's worth calling attention to, because the reason this puzzle is so famous is that everyone's gut instinct about what to do is wrong, but it wouldn't be if Monty was picking doors completley at random. So, having already told you that the answer that first springs to mind for you is the wrong one, I'll now ask: should the contestant switch to the other closed door? If you're like most people, you're thinking it doesn't make a difference. There's two doors the car could be behind, so there's a fifty percent chance they win if they stick with their original guess and a fifty percent chance if they switch. But that's not true. I'm going to prove that to you using Bayes' Theorem, because I want you to see how the theorem works, and this is one way to do that without putting you to sleep. (I hope). But if you're not convinced, there are other ways of showing that the contestant is definitely better off switching. Google the Monty Hall problem. There are a lot websites that discuss it. A few links down, you'll find an interactive feature put together by the New York Times that lets you play the game yourself, as many times as you like. I guarantee you that if you do it a few times, you'll find that your initial guess was wrong twice as often as it was right. Let's say the contestant picked A initially, and Monty opened B. (Not that it matters; the answer's the same no matter which door got picked.) Initially, we'd have to say the car is equally likely to be behind each of the doors. So the probability of it being behind C, the one I'm about to prove the contestant should switch their guess to, is one-third. (I use some formal notation on the slide partly to save space and partly to get you used to seeing that sort of thing, since you'll see a lot more of it as we go on.) So c, our initial estimate, or belief, is one-third. And c-prime, our updated belief (updated, that is, in response to the information revealed when Monty opened door B) is defined by Bayes' Theorem. So let's apply the formula from the previous slide. The probability that Monty would open door B if the car was in fact behind door C is 1. That right there is pretty much the key to understanding the puzzle, incidentally. So let's talk about that. Why is it 1? Because, as I said, Monty knows where the car is. He doesn't open doors at random. He opens one of the doors the contestant didn't pick, and always reveals a goat when he does so. Well, if the car was behind door C, and the contestant picked A, then there's only one door Monty can open. 100 times out of 100, under those conditons, he's going to open door B. Because he has to. He's obviously not going to show the contestant where the car is before asking if they want to switch. Might as well just give the car away and not bother with the rest. So we've got one-third times one, which is just one third, in the numerator. In the denominator, we have the same thing again, as well as the probability that Monty would open door B if the car was behind door A multiplied by the probability that the car would be behind door A. As I've said, initially, the car is no more likely to be behind any one door than another. So we've got one-third here too. But it's being multiplied by one-half, not one. Why is that? Because if the car was behind door A, Monty would have two doors to choose from. He'd no longer be forced to pick B, the way he would be if the car was behind C. Again, remember the basic rules. Monty always picks a door other than the one the contestant picked, and he always reveals a goat. There's two ways for him to do that if the contestant picked the right door, and he's no more likely to pick one than the other. So we multiply one-third by one half and get one-sixth. And we add that to the one third and get one half overall for the denominator. And one-third over one half is two-thirds. (Dividing by one half is the same as multiplying by two.) That means there's about a 66.7 percent chance the car is behind door C, and only a 33.3% chance it's behind A. So the contestant should switch. They might still lose, but they are unquestionably better off switching. As I said before, that's just one way of showing that most people's intuition is wrong. And, again, the key is that Monty knows where the car is. If he opened doorspurely at random, yes, there would be a 50% chance of the car being behind door A once he opened door B and revealed a goat. But the point isn't how to play Let's Make a Deal. I just wanted you to see how Bayes' Theorem is applied, because you'll see it again. Turn to the last slide, where I apply Bayesian reasoning to an area of actual relevance to international relations: the supposed link between Islam and terrorism. Assume, for the sake of argument, that 100% of terrorist incidents (by which I mean successful attacks and foiled attempts) involved Muslim perpertrators. That's not even close to true, incidentally, as we'll discuss later on. But let's pretend it is for now. I'm going to use Bayesian reasoning to show you that even if every single terrorist, future terrorist, and terrorist sympathizer was Muslim, that still wouldn't come close to justifying the fear of Islam that's all too common in the West. Suppose we had some way of knowing that two terrorist incidents were going to take place in the UK this year. Depending on how exactly you define terrorism (which we'll talk more about later), and what your standard is for ruling something a failed attempt, that could be seen as either a typical year or a little worse than average. Let's also put the number of Muslims in the UK at two and a half million. That's actually a bit high, as of this recording, but it's in the right ballpark. What I've said so far would imply (falsely) that the probability of any given terrorist being Muslim is 1, but (correctly) the probability of any given Muslim being a terrorist is in on the order of 0.0000008. That was six zeroes, by the way. To put that in perspective, every time you brush your teeth, you expose yourself to a risk of injury that's about thirteen times greater than the probability of any given Muslim you come across in this country attempting to carry out a terrorist attack this year.