Download Optional Lecture 1 Welcome back to Introduction to International

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Optional Lecture 1
Welcome back to Introduction to International Relations. This is the first optional
lecture for this module. That means that there won't be any test questions drawn
directly from this material, nor will any of the activities be based on it. However, you
may find future lectures easier to understand if you listen to this one. And, as broad a
topic as probability is, I'm only going to cover a few things in this lecture, all of which
will come up again. So keep listening.
Specifically, as you can see on the first slide, there are three things I'll go over. First,
I'm going to introduce some basic terms and concepts. Second, I'm going to discuss
expected values. There are actually a lot of ways to make use of that idea, but the
main reason I'm going to walk you through it is that expected values play a critical
role in some of the game-theoretic models you'll see later in this module. Finally, I'm
going to present Bayes' Theorem, which also has a lot of applications beyond those
you'll see here, which will again mostly be game-theoretic.
On the second slide, you see two important terms, which you've actually already
heard me use. (At least, you will have if you've listened to the first two lectures, as
you should have done.) The first is “variable”, which is an alphabetic character, Greek
letter, or word that represents numeric values that differ across observations.
Oberservations, incidentally, which I haven't bothered to define on the slide, are the
basic units of a data set. Things we've observed. So, for example, if I were to ask all of
you how tall you are, which I have (check your email), then each one of you (who
responds) will comprise a separate observation in my data set, and height will be one
of the variables. In other words, a variable is a thing that varies; it takes on different
values for different people, or countries, or whatever it is we're analysing.
Variables are sometimes mixed up with constants. As the name implies, the latter do
not vary. They are....constant. But they are generally represented the same way. So
you won't know that they're constants just by looking at the notation used. You might
have to go into the data set, though in many cases a little common sense will suffice.
Turn now to the third slide. You see there an example of a data set. The very first row
tells us the names of the variables (or constants). Each row after that is a unique
observation. So while there are twenty-one rows in the spreadsheet, there are only
twenty observations in the data set. To help clarify that, I've actually included a
variable that simply records the observation number. As you can see, I've named that
variable “obs” for short. Naturally, I've put it in the first column. (If you're not
familiar with spreadsheets and tables, rows run across; columns up and down.) That's
not a variable we'd do anything with, but sometimes it's helpful to have identifying
variables in our data sets.
In the second column, we have “k”. That's common notation for a constant. How do
we know it's a constant? Well, because it doesn't vary. See how that works? In all
twenty of the obversations, the numeric value k takes on is the same. (It doesn't
matter that that value is 1. Even if it was 17.4983, or whatever, k would be a constant.
What matters is that it doesn't take on different values for different observations.
Which, as I said in the first lecture, means we can't use it to explain anything that does
vary across observations. Which is generally what we're after.)
In the next three columns, you see proper variables. I've named two of them x (the
first x1 and the second x2, cleverly enough) and one y. As I discuss in the nect lecture,
x is commonly used for independent variables and y dependent variables. (More on
what those are next time. For now, just note that variables are sometimes named for
what they measure, and sometimes for the role they serve.)
The actual values of these variables are irrelevant. I generated them randomly. I'll
show you other spreadsheets in the future where the variables will actually measure
things we care about. For now, I just wanted you to see what a data set looks like so
that the definitions I gave on the previous slide would make more sense.
So that's it for terminology. Variables vary, constants don't. The name (or letter or
symbol) used may or may not tell us whether we're dealing with a variable or a
constant. There's certain conventions – k for constant, x and y for variables – but
they're not always followed. What really differentiates the two is that if we go into
our data set, we'll see different numerical values recorded for at least some of the
observations (they don't all have to be different) for our variables, but the same exact
value in every case for constants. And that means the latter are of no use to us.
On the fourth slide, I discuss expected value. Like I said earlier, this is actually useful
in a lot of ways. I'll give you some examples soon. But let's tough our way through the
technicalities, with all that notation, first. It's really not as bad as it looks.
So we've got a random variable x. (A random variable is a one whose value is
determined partly by chance. That isn't necessarily true of other types of variables.)
There are N different things that can occur. Each of them does so with its own
probability. (I use the subscript here because there may be more than one
probability we need to keep track of, depending on how big N is. If that's confusing,
it'll be clearer soon.) And each has it's own numerical value. (Again, if that's too
abstract, just be patient.) So the expected value of x, written as capital E with x in
parentheses, is equal to the sum of p-i times z-i. That funny looking symbol that also
looks a bit like an E (or the upper-case sigma, for those of you that know your Greek
letters) is the summation sign. It tells us to perform the same operation again and
again, starting from the first possibility and ending with the last, then adding
everything up when we're done. In this case, what we're repeating is multiplying
probabilities by the values associated with different outcomes.
Nerds like me write it that way because it saves space. But if you find that
intimidating or confuse, you can do it the long way. E of x then would be p-1 times z-1
plus p-2 times z-2, and so on, until you get to p-N times z-N.
Still too abtract? Okay, turn to the next slide, where you've got a practical example.
Suppose you're trying to figure out how you're going to do in this module. For
whatever reason (I've just chosen numbers arbitrarily so you have something other
than p's and z's to work with), you expect to earn 80 percent of the points available
from the in-class activities. And you expect to get a 70 on your short essay.
For some of you, that might be a bit optimistic, while for others it'll be pessimistic.
The actual numbers don't matter. Feel free to hit pause and play around with
different ones after I finish going through this example. It would be good practice.
So we're treating those as constants (to keep things simple). But we're going to let
your performance on the take-home tests vary. Let's assume, again just by way of
example, that you figure there's a 10% chance you'll do pretty well and score an
average of 80 on them, a 10% chance you'll do poorly and average 65, but most likely
you'll end up somewhere in the middle. Specifically, we're guessing there's a 40%
chance each of you averaging 70 and 75 on the two tests.
As it says in the syllabus, the activites are worth 25%, the short essay 15%, and the
two take-home tests 30% each. Which means that if we're plugging in your average
test score, for simplicity, we're going to count it for 60%. (Obviously you could use
individual test scores, and even give different probabilities for the outcomes if you
think you'll do better on the second test. That would just add a couple more steps to
the calculation.) So, if you average 65 on the tests, your grade for the module would be
a 69.5. (That's 0.6 times 65, which is 39, and 0.25 times 80, which is 20, and 0.15
times 70, which 10.5, added together.) We can do the same thing for each of the other
possibilities, and you can see the grades that would result there on the slide.
So now we calculate your expected grade. We've got four z's – that's what your grade
for the module would be under each of the possibilities we discussed in terms of your
performance on the tests – and four associated p's. As the previous slide tells us, we
now need to multiply the z's by the p's and add all that all together (not unlike how
we came up with the final grades). As you can see on the last bullet point, that means
you would expect to finish the module with a 74. You might do better, and you might
do worse, but that would be your best guess, given the assumptions we made. Which
you're welcome to change if you think those weren't generous enough. Or if you're
honest enough to admit that it might have been too generous.
Okay, there were a lot of moving parts there. Hopefully, you were able to follow along
anyway, but in case anyone would find this helpful, I'll do a couple really simple
examples before we move on to our last topic for this lecture.
Suppose we agree to bet on a coin flip. You guess that it will come up heads. I
promise that if it does, I'll pay you ten quid. But if it doesn't, I won't pay you anything.
(Nor will I ask you to pay me anything. Because one of us has a real job, and I'd have
to be a real jerk to take your money.) How much do you expect to make? Here, x is
how much I pay you, p-i the probability of each possible outcome of the coin flip, and
z-i the amounts you win at each outcome. Of course, we only have two possibilities,
so we don't even need to bother with a bunch of different p's. Whenever there are
two possible outcomes, we only have to specify one p, because something has to
happen, and that means that if the first one doesn't, the second one will. Meaning
that the probability of the second outcome is one minus the probability of the first. In
the case of a coin flip, assuming it's not weighted, and whoever flips it doesn't cheat
somehow, the probability it comes up heads is 0.5, and the probability it comes up
tails is 1 minus 0.5, which is of course 0.5 again. So we multiply 10 (what you win if it
comes up heads) by 0.5 (how likely it is to do so) and add to that 0 (what you win if it
comes up tails) multiplied by 0.5 (how likely that is to happen), and get 5 quid.
(Because half of ten is five, and zero times anything is zero, and zero added to five is
five.) It may sound strange to say that you expect to win 5 quid, because there's
precisely zero percent chance of that happening – you'll either get ten or nothing –
but that's how expected value works. Which probably makes you think it's a stupid
concept. It's not, though. You just have to know how to interpret it. That 5 we came
up with isn't so much a prediction about what will happen as it is a benchmark that
helps us figure out whether you've got a good bet on your hands.
Let's try another example. Your friend is absolutely convinced the Conservative party
will win the next general election in the UK, because he or she thinks Jeremy Corbyn
is too far from centre. You figure it's true that Corbyn puts the party at a
disadvantage, but not as big as your friend is making it sound. Your friend is so
confident, though, that he or she is willing to offer you fifty pounds if Labour wins,
while only expecting you to pay ten if the Tories do. Should you take the bet?
I'm no expert on British politics, but I'd say you probably should.
The best (not easiest, but best) way to think about this is to calculate your expected
value for the wager. To avoid making any strong assumptions, I'm not even going to
assign actual probabilities to the outcomes. Just figure out what the smallest
probability of a Labour victory is for which your expected value would still be
positive. That requires some algebra, though. (If you need a refresher, give the next
optional lecture a listen.) Let p be the probability of a Labour victory. Your expected
value for this bet is p times 50 plus one minus p times negative 10 (because you'd
lose money in that case). What we want to know is how big p has to be for that whole
expression to be greater than 0. So that's 50p plus one minus p, in parentheses, times
minus 10, set greater than 0. That simplifies to 50p minus 10 plus 10p greater than
0, or 60p greater than 10, or p greater than one-sixth. So even if you think Labour is
more likely to lose than not, you only need to think the odds of a Labour victory are
better than 1 in 6 (which is a pretty low bar) for it to be a good idea to take the bet.
The moral of that story is that your friend is an idiot.
Actually, the point is, we use expected values to make better decisions, not as literal
descriptions of what's going to happen. People use them for gambling, investing in
stocks, and so on. We'll use them in this module to gain a better understanding of
international relations. Really. That might sound absurd now, but I hope to change
your mind after a few more lectures.
Okay, that brings us to our last topic: Bayes' Theorem. Which is named after...some
dead white guy none of us care about. I'm being flippant, but I bet that's actually true,
so let's get straight to the theorem. If you do actually care, look him up. You have my
permission to use Wikipedai. Am I allowed to say that?
Bayes' Theorem is used to define conditional probabilities, which means it tells us
how likely one thing to be true given that some other (presumably relevant) thing is
known to be true. In other words, anytime you receive a piece of information that
could help you make a more educated guess about something else, you can (and
probably should, though only super-geeks like me do) apply Bayes' Theorem.
(I joke that no one actually uses it, which really isn't true, but even if it was, that
wouldn't mean it's a waste of time to learn Bayes' Theorem. We have good evidence
that people typically behave as if they're using it, to a certain extent. So depending on
whether you're trying to get a sense of general patterns of behavior, which is all
we're after in this module, or trying to make precise predictions, which I don't advise,
it can be very useful to develop theories where people are assumed to incorproate
new information properly, in accordance with Bayes' Theorem. The analogy some
people use here, which I find helpful, is that dogs can't do advanced mathematics, but
are pretty good at catching frisbees. No one thinks that means physics equations
don't accurately describe the way objects move through space, though. Similarly, I
don't actually think world leaders go through the sorts of calculations envisioned by
the theoretical models you'll see in this module. But that doesn't mean we can't use
mathematical models to understand the world better.)
So, the theorem tells us that the probabiltiy of A being true, given that we know B is
true, is equal to the probability that we'd observe B if A was true, weighted by our
initial estimate of the probability of A being true, divided by a bunch of junk that
amounts to the total probability of observing B. Yeah, I know, it's a bit of an eyesore.
But it makes sense if you step back from all the notation and think about it
informally. I want to know if something's true, and I've just gotten a clue that's no
smoking gun, but might be helpful. So I ask myself, how often would that piece of
evidence exist if I was right, and how often would it exist if I was wrong? And of
course I need to take into account how likely I was to be right in the first place.
I'm going to take you through two examples, one in detail and one a bit more loosely.
One will show you exactly how the theorem works, the other why it matters.
There was this old game show in the US, Let's Make a Deal, and there's a famous
puzzle based on it called the Monty Hall Problem. There are three doors. Behind one
is a new car, while the others only hide goats. (Why goats? I don't know. Monty Hall
was a strange dude.) The contestant picks a door. Monty opens one of the other two
and reveals a goat. Note, Monty knows where the car is, and he never opens that
door. That may seem obvious – the show would lose a lot of money if the host often
gave away the answer by mistake – but it's worth calling attention to, because the
reason this puzzle is so famous is that everyone's gut instinct about what to do is
wrong, but it wouldn't be if Monty was picking doors completley at random. So, having
already told you that the answer that first springs to mind for you is the wrong one,
I'll now ask: should the contestant switch to the other closed door?
If you're like most people, you're thinking it doesn't make a difference. There's two
doors the car could be behind, so there's a fifty percent chance they win if they stick
with their original guess and a fifty percent chance if they switch. But that's not true.
I'm going to prove that to you using Bayes' Theorem, because I want you to see how
the theorem works, and this is one way to do that without putting you to sleep. (I
hope). But if you're not convinced, there are other ways of showing that the
contestant is definitely better off switching. Google the Monty Hall problem. There are
a lot websites that discuss it. A few links down, you'll find an interactive feature put
together by the New York Times that lets you play the game yourself, as many times
as you like. I guarantee you that if you do it a few times, you'll find that your initial
guess was wrong twice as often as it was right.
Let's say the contestant picked A initially, and Monty opened B. (Not that it matters;
the answer's the same no matter which door got picked.) Initially, we'd have to say
the car is equally likely to be behind each of the doors. So the probability of it being
behind C, the one I'm about to prove the contestant should switch their guess to, is
one-third. (I use some formal notation on the slide partly to save space and partly to
get you used to seeing that sort of thing, since you'll see a lot more of it as we go on.)
So c, our initial estimate, or belief, is one-third. And c-prime, our updated belief
(updated, that is, in response to the information revealed when Monty opened door
B) is defined by Bayes' Theorem. So let's apply the formula from the previous slide.
The probability that Monty would open door B if the car was in fact behind door C is
1. That right there is pretty much the key to understanding the puzzle, incidentally. So
let's talk about that. Why is it 1? Because, as I said, Monty knows where the car is. He
doesn't open doors at random. He opens one of the doors the contestant didn't pick,
and always reveals a goat when he does so. Well, if the car was behind door C, and the
contestant picked A, then there's only one door Monty can open. 100 times out of
100, under those conditons, he's going to open door B. Because he has to. He's
obviously not going to show the contestant where the car is before asking if they
want to switch. Might as well just give the car away and not bother with the rest.
So we've got one-third times one, which is just one third, in the numerator.
In the denominator, we have the same thing again, as well as the probability that
Monty would open door B if the car was behind door A multiplied by the probability
that the car would be behind door A. As I've said, initially, the car is no more likely to
be behind any one door than another. So we've got one-third here too. But it's being
multiplied by one-half, not one. Why is that? Because if the car was behind door A,
Monty would have two doors to choose from. He'd no longer be forced to pick B, the
way he would be if the car was behind C. Again, remember the basic rules. Monty
always picks a door other than the one the contestant picked, and he always reveals a
goat. There's two ways for him to do that if the contestant picked the right door, and
he's no more likely to pick one than the other. So we multiply one-third by one half
and get one-sixth. And we add that to the one third and get one half overall for the
denominator. And one-third over one half is two-thirds. (Dividing by one half is the
same as multiplying by two.) That means there's about a 66.7 percent chance the car
is behind door C, and only a 33.3% chance it's behind A. So the contestant should
switch. They might still lose, but they are unquestionably better off switching.
As I said before, that's just one way of showing that most people's intuition is wrong.
And, again, the key is that Monty knows where the car is. If he opened doorspurely at
random, yes, there would be a 50% chance of the car being behind door A once he
opened door B and revealed a goat. But the point isn't how to play Let's Make a Deal. I
just wanted you to see how Bayes' Theorem is applied, because you'll see it again.
Turn to the last slide, where I apply Bayesian reasoning to an area of actual relevance
to international relations: the supposed link between Islam and terrorism.
Assume, for the sake of argument, that 100% of terrorist incidents (by which I mean
successful attacks and foiled attempts) involved Muslim perpertrators. That's not
even close to true, incidentally, as we'll discuss later on. But let's pretend it is for now.
I'm going to use Bayesian reasoning to show you that even if every single terrorist,
future terrorist, and terrorist sympathizer was Muslim, that still wouldn't come close
to justifying the fear of Islam that's all too common in the West.
Suppose we had some way of knowing that two terrorist incidents were going to take
place in the UK this year. Depending on how exactly you define terrorism (which
we'll talk more about later), and what your standard is for ruling something a failed
attempt, that could be seen as either a typical year or a little worse than average.
Let's also put the number of Muslims in the UK at two and a half million. That's
actually a bit high, as of this recording, but it's in the right ballpark. What I've said so
far would imply (falsely) that the probability of any given terrorist being Muslim is 1,
but (correctly) the probability of any given Muslim being a terrorist is in on the order
of 0.0000008. That was six zeroes, by the way. To put that in perspective, every time
you brush your teeth, you expose yourself to a risk of injury that's about thirteen
times greater than the probability of any given Muslim you come across in this
country attempting to carry out a terrorist attack this year.