Download N - The University of Texas at Dallas

Outcomes, Counting, Countability, Measures and Probability OPRE 7310 Lecture Notes by Metin Çakanyıldırım Compiled at 17:34 on Friday 23rd September, 2016 1 Why Probability? We have limited information about the experiments so we cannot know their outcomes with certainty. More information can be collected, if doing so is profitable, to reduce uncertainty. But some amount of uncertainty always remains as information collection is costly and might even be impossible or inaccurate. So we are often bound to work with probability models. Example: Instead of forecasting the number of probability books sold at the UTD bookstore tomorrow, let us ask everybody (students, faculty, staff, residents of Plano and Richardson) if they plan to buy a book tomorrow. Surveying potential customers in this manner is always possible. But surveys are costly and inaccurate. ⋄ Before setting up probability models, we observe experiments and their outcomes in real-life to have a sense of what is likely to happen. Deriving inferences from observations is the field of Statistics. These inferences about the likelihood of outcomes become the ingredients of probability models that are designed to mimic the real-life experiments. Probability models can later be used to make decisions to manage the real-life contexts. Example: Some real-life experiments that are worthy of probability models are subatomic particle collisions, genetic breeding, weather forecasting, financial securities, queues. ⋄ 2 An Event – A Collection of the Outcomes of an Experiment Outcome of an experiment may be uncertain before the experiment happens. That is, the outcome may not be determined with (sufficient) certainty ex ante. Here the word experiment has a broad meaning that covers more than laboratory experiments or on-site experiments. It covers any action or activity whose outcomes are of interest. This broader meaning is illustrated with the next example. Example: As an experiment, we can consider an update of Generally Accepted Accounting Principles (GAAP) issued by Financial Accounting Standards Board (FASB.org). Suppose that the board is investigating an update of reporting requirements for startups (more formally, Development Stage Entities). The board can decide to keep (K) the status quo, increase (I) the reporting requirements or decrease (D) them. Although accounting professionals can assess the likelihood of each of the outcomes K, I and D, they cannot be certain whether the board’s discussion will lead to I or D, or K, so the outcomes of the update experiment are uncertain. ⋄ Sufficiency of certainty depends on the intended use of the associated probability model. A room thermostat may be assumed to be showing the room temperature with sufficient certainty for the purpose of controlling the temperature with an air conditioner. The same thermostat may have insufficient certainty for controlling the speed of a heat releasing chemical reaction. When the uncertainty is deemed to be sufficient, it can be reduced, say by employing a more accurate thermostat. Or a probabilistic model can be designed to incorporate the uncertainty, say by controlling the average speed of the reaction. 1 Example: Outcomes of a dice rolling experiment are 1, 2, 3, 4, 5 and 6. For a fair dice, each outcome is (sufficiently) uncertain. ⋄ Each outcome of an experiment can be denoted generically by ω or indexed as ωi for specificity. Often these outcomes are minimal outcomes that cannot be or are not preferred to be separated into several other outcomes. Such minimal outcomes can be called elementary outcomes. Two elementary outcomes cannot occur at once, so elementary outcomes are mutually exclusive. Then elementary outcomes can be collected to obtain a set of outcomes Ω, that is generally called the sample space. Example: For the experiment of updating the accounting principles, ω1 = K, ω2 = I, ω3 = D and Ω = {K } ∪ { I } ∪ { D } = {K, I, D }. ⋄ Example: For the experiment of rolling a dice, ωi = i for i = 1, . . . , 6 and Ω = {1, 2, 3, 4, 5, 6}. ⋄ An event is a collection of the outcomes of an experiment. So, an event is a subset of the sample space, i.e., a non-empty event A has ω ∈ A ⊆ Ω for some ω. An event can be empty and is denoted by A = ∅. We are not interested in impossible events in practice, the consideration of ∅ is useful for theoretical construction of probability models. Example: For the experiment of updating the accounting principles, the event of not increasing the reporting requirements can be denoted by {K, D }. ⋄ Example: For the experiment of rolling a dice, the event of an even outcome is {2, 4, 6} and the event of no outcome is ∅. ⋄ Example: Consider the collision of two hydrogen atoms on a plane. One of the atoms is stationary at the origin and is hit by another moving from the left to right. After the collisio, the atom moving from left to right can move to the 1st, 2nd, 3rd or 4th quadrant. The sample space for the movement of this atom is {1, 2, 3, 4} and the event that it bounces back (moves from right to left after the collision) is {2, 3}. ⋄ Since an event corresponds to a set in the sample space, we can apply set operations on events. In particular, for two events A and B, we can speak of intersection, union, set difference. If the intersection of two events is empty, they are called disjoint: If A ∩ B = ∅, then A and B are disjoint. Example: In an experiment of rolling a dice twice, we can consider the sum and the multiplication of the numbers in the first and second rolls. Let A be the event that the multiplication is odd; B be the event that the sum is odd; C be the event that both multiplication and sum are odd; D be the event that both multiplication and sum are even. The outcomes in A have both the first and second number odd, while the outcomes in B have an odd number and an even number. Hence no outcome can be in both event A and B, which turn out to be disjoint events: A ∩ B = ∅ = C. To be in D, an outcome must have both numbers even. In each outcome, either both numbers are odd (so multiplication is odd and sum is even =⇒ A); or one is odd while the other is even (so multiplication is even and sum is odd =⇒ B); or both numbers are even (so multiplication and sum are even =⇒ D). Hence, A ∪ B ∪ D = Ω. To convince yourself further, you can enumerate each outcome and see whether it falls in A or B or D by completing Table 1. ⋄ 3 Counting Countable Outcomes When outcomes are countable, they can be finite or infinite. 2 Table 1: Sample space Ω for two dice rolls is composed of pairs (i, j) for i, j = 1 . . . 6. What is shown in each cell below are (multiplication=ij,sum=i + j) and the associated event. First Roll 1 2 3 4 5 6 1 (1, 2), A (2, 3), B (3, 4), (4, 5), 2 (2, 3), B (4, 4), D (6, 5), (8, 6), Second Roll 3 4 (3, 4), A (4, 5), B (6, 5), B (8, 6), D (9, 6), (12, 7), (12, 7), (16, 8), 5 (5, 6), A (10, 7), B (15, 8), (20, 9), 6 (6, 7), B (12, 8), D (18, 9), (24, 10), 3.1 Finite Outcomes 3.1.1 Multiplication Principle We can manually count the outcomes of an experiment if the outcomes are finite. In the experiment of updating the accounting principles, the experiment of rolling a dice and the experiment of rolling a dice twice, the number of outcomes are respectively 3, 6 and 36. These numbers are found by manually counting the outcomes. Sometimes instead of manually counting, we use the multiplication principle of counting illustrated in the next example. Example: An online dress retailer carries 3 styles of lady dresses: Night dress, Corporate dress and Sporty dress. Each style has 20 cuts, 8 sizes and 5 different colors. A stock keeping unit (sku) for the online retailer is defined by the dress style, cut, size and color as these four characteristics fully describe the dress item and are used to buy dresses from the suppliers. The number of skus for this retailer is 3 × 20 × 8 × 5. ⋄ When the outcome of an experiment is defined by K characteristics that are independent of each other, we can use the multiplication principle. We start by enumerating the number of ways characteristic k can materialize and denote it by nk . Then the number of outcomes is n1 n2 . . . nK . For the example of the online retailer above, nstyle = 3, ncut = 20, nsize = 8, ncolor = 5 for the set of characteristics {style, cut, size, color }. Another way to denote these is to set 1 := style, 2 := cut, 3 := size, 4 := color, so K = 4 and n1 = 3, n2 = 20, n3 = 8, n4 = 5. If the characteristics are not all independent of each other, we can still use the multiplication principle with some adjustments. Example: After a market research study, the online dress retailer decides to customize its offerings. It offers 22 cuts of Night dresses, 18 cuts of Corporate dresses and 34 cuts of Sporty dresses. Night dresses need to fit more closely so they have 10 sizes, while Corporate and Sporty dresses have respectively 8 and 6 sizes. The number of skus become (22 × 10 + 18 × 8 + 34 × 6) × 5. In this case, the color is independent of other characteristics. Within each style, the cut and the size characteristics are independent. ⋄ 3.1.2 Permutations There are other ways of counting the outcomes of experiments. Counting permutations is one of them. Example: The online retailer intends to show each dress in 5 different colors side by side on its web site so that customers can easily compare the colors and buy the one(s) they like. The primary five colors are White (W), B (Black), L (Blue), R (Red), Y (Yellow). The intention is that the customer picks the dress in each color and brings it into a cell in a 5 × 1 table on the screen and compares the colors. To make this process efficient, the online retailer asks the web designer to restrict customers so that they can pick each color exactly once. Some example outcomes are [W, B, L, R, Y ], [ B, L, R, Y, W ], [ L, R, Y, W, B], [ R, Y, W, B, L] , [Y, W, B, L, R]. The number of ways 5 colors can be put in the 5 × 1 table without repeating the colors is the number of permutations of colors. There are 5 color choices for the first cell, 4 color choices for the second, 3 choices for 3 the third, 2 choices for the fourth, only 1 choice for the last. Using the principle of multiplication, 5 colors can have 5 × 4 × 3 × 2 × 1 permutations. ⋄ In general, n distinct objects can have n! := n × (n − 1) × · · · × 2 × 1 permutations of length n. If the permutation length is k ≤ n, then the number of such permutations is Pkn := n × (n − 1) × · · · × (n − k + 1), which has multiplication of exactly k terms. Said differently, Pkn is the number of permutations of k objects out of n distinct objects. Pkn is referred to as k-permutations-of-n. As in the online retailer’s color example, sometimes objects are virtual and can be repeated (picked up) as many times as necessary. This is often referred to as sampling with repetition. Example: Despite the online retailer’s specification, the web designer cannot restrict the customers to pick a color only once. For example, W can be picked up for the first and second cells in the 5 × 1 table. Then the colors are sampled by the customers with repetition and we cannot speak of permutations. Rather we can ask the number of ways 5 colors can be put in the table with repetition. There are 5 color choices for the first cell, 5 color choices for the second, 5 choices for the third, 5 choices for the fourth and 5 choice for the last. Once more using the principle of multiplication, 5 colors can be placed in 5 cells with repetition in 5 × 5 × 5 × 5 × 5 ways. ⋄ In general, n distinct objects can be put in k boxes with repetition in nk ways. Repetitions increase the number of ways objects can be organized: nk > Pkn . 3.1.3 Combinations A key question in counting the outcomes is whether the sequence of objects in an outcome makes it different from another outcome including exactly the same objects. Consider [W, B, L, R, Y ] vs. [ B, L, R, Y, W ] of the colors, we considered these two sequences as different above and wrote them as a vector by using square brackets. If the comparison of colors depends on the sequence of colors, the sequence matters and customers perceive [W, B, L, R, Y ] different from [ B, L, R, Y, W ]. If the sequence does not matter both [W, B, L, R, Y ] and [ B, L, R, Y, W ] have the same colors and can be mapped to the set {W, B, L, R, Y }. Such mapping is many-toone because many sequences boil down to the same set of objects. Example: Suppose that the web designer has created a 3 × 1 table and can restrict customers to put at most one color in each cell. If the sequence matters and sampling is without repetition, the number of placing 5 colors in 3 cells is P35 = 5 × 4 × 3 = 60. Now consider colors B, R, Y and the sequences that can be generated only from these three colors: [ B, R, Y ], [ B, Y, R], [ R, B, Y ], [ R, Y, B], [Y, B, R], [Y, R, B]. It is easy to see that 3 colors can make 6=3! sequences, so the mapping from the set of sequences to the set of items is (3!)-to-(1). In other words, when we start treating different sequences with the same items as the same sets, the number of outcomes based on sequences should drop by a factor of 6 to obtain the number of outcomes based on sets. If the sequence does not matter and sampling is without repetition, the number of placing 5 colors in 3 cells is P35 /3! = 5 × 4 × 3/6 = 10. ⋄ From above, the number of sequences with length k that can be made without repetition from n items is Pkn . When we consider different sequences with the same items as the same set, the number of sets become one k!th of the number of sequences. Hence, the number of subsets including exactly k items out of n distinct items is Ckn := Pkn /k!. Ckn is referred to as k-choose-from-n. Picking subsets from sets is called making combinations. Example: In what is called a combination lock (often with 4 digits), there are several concentric dials, each with digits {0, 1, 2, . . . , }. The lock unlocks when all dials show previously chosen digits in the correct order. These previously chosen digits and their order act like a password for the lock. For the lock, the sequence of digits matter, e.g., 1234 is different from 4321. So this sort of lock should be called a permutation lock as opposed to a combination lock. ⋄ 4 Example: OM area has 20 Ph.D. and 200 Master of Science students. Faculty considers inviting 2 Ph.D. and 4 master students to a curriculum meeting. How many ways are there to choose 2 Ph.D. and 4 master students? There are C220 ways to choose Ph.D. students and C4200 ways to choose master students, so the number of ways is C220 C4200 . ⋄ The number of combinations of k objects taken out of n objects is Ckn . When we are picking k objects, we are making up two subsets - objects picked and objects unpicked. What happens if we are to make up r subsets out of n objects such that subset i has k i objects and ∑ri=1 k i = n. The number ways r subsets can be made up is Ckn1 ,k2 ,...kr = n!/(k1 !k2 ! . . . kr !). Example: 11 Ph.D. students are to be assigned to 4 professors: Ganesh, Shun-Chen, Anyan and Metin so that Ganesh and Anyan have 4 students each and Shun-Chen has 2 students while Metin has 1 student. How many assignments are possible? We are splitting students into 3 subsets with nG = n A = 4 and nS = 2, n M = 1 while nG + n A + nS + n M = 11. The number of ways is 11!/(4!4!2!1!). ⋄ Example: How many distinct permutations can be obtained from the letters of Mississippi? Mississippi has 11 letters, 4 Is, 4 Ss, 2 Ps and 1 M, so the number of permutations is 11!/(4!4!2!1!). ⋄ While discussing combinations above, we have referred to the number of subsets, whose elements cannot repeat. So the combination discussion above pertains only to sampling without repetition. If repetition is allowed in a collection, the collection is called a multiset. Each set is a multiset, so the multiset notion is a generalization of the set notion. In a multiset, the total number of elements, including repetitions, is the cardinality of the multiset and the number of times an element appears is the multiplicity of that element. Example: Suppose that we are to pick colors B, R, Y to create multisets of cardinality 3. ⟨ B, R, Y ⟩ is the unique multiset whose elements do not repeat, so { B, R, Y } is also a set. As repetition is allowed in a multiset, some elements can be used twice or thrice while the others are not used at all. If Y is not used, we can still construct multisets ⟨ B, R, R⟩, ⟨ B, B, R⟩, ⟨ B, B, B⟩ and ⟨ R, R, R⟩. If Y must be used, we can construct some other multisets ⟨ B, Y, Y ⟩, ⟨ R, Y, Y ⟩, ⟨Y, R, R⟩, ⟨Y, B, B⟩ and ⟨Y, Y, Y ⟩. The multiplicity of B in ⟨ B, R, R⟩ is x B = 1, while the multiplicity of B in ⟨ B, B, R⟩ is x B = 2. ⋄ Using n distinct elements, how many multisets with cardinality k can we construct? Each multiset is uniquely identified by its multiplicities { x1 , x2 , . . . , xn }. Since we set the cardinality equal to k, we need to insist on x1 + x2 + · · · + xn = k. Also each multiplicity must be a natural number (non-negative integer), i.e., xi ∈ N for i = 1 . . . n. The number of multisets with cardinality k is the number of solutions to X := { x1 + x2 + · · · + xn = k, xi ∈ N for i = 1 . . . n}. To find the number of solutions to X , we first consider a seemingly different problem. Suppose that we have n + k − 1 objects denoted by “+”. + + + ...... 1st 2nd 3rd . . . . . . + + n + k − 2nd n + k − 1st We encircle n − 1 of these + objects to obtain exactly n segments made up of some or no + objects. If j − 1st and jth encircled +s are next to each other, then the jth segment has no + or x j = 0. In general, x j is the number +s in the jth segment. + ... 1st . . . ⊕ + + ... x1 th 1st circle 1st . . . ⊕ + ...... x2 nd 2nd circle . . . . . . ⊕ + ... n − 1st circle 1st . . . + xn th, where the indexing of + objects restart from 1 after each encircled +. By using n − 1 circles, we have obtained n segments, each segment j has x j elements, the sum of x j s must be n + k − 1 minus n − 1 as we start with 5 n + k − 1 objects and encircle n − 1 of them. Hence, x1 + x2 + · · · + xn = k. Each solution to X has a corresponding way of encircling +s, and vice versa. So the number of solutions to X is the number of ways + k −1 we can encircle n − 1 objects out of n + k − 1 objects, which is Cnn− = Ckn+k−1 . 1 Example: How many multisets with cardinality k = 3 can be assembled from n = 3 colors B,R,Y? Plugging in the numbers, the answer turns out to be C35 = 10. For this small problem, we can list all of these multisets: ⟨ B, R, Y ⟩, ⟨ B, R, R⟩, ⟨ B, B, R⟩, ⟨ B, B, B⟩, ⟨ R, R, R⟩, ⟨ B, Y, Y ⟩, ⟨ R, Y, Y ⟩, ⟨Y, R, R⟩, ⟨Y, B, B⟩ and ⟨Y, Y, Y ⟩. If we add colors bLack and White, we have n = 5 colors to create more multisets with cardinality k = 3. The number of such multisets is C37 = 35. ⋄ Next table summarizes our discussion on finite outcomes. Table 2: Number of ways of constructing cardinality k permutations, strings, sets or multisets from n distinct objects. No Repeat? Yes Sequence matters? Yes No Ckn sets Pkn permutations nk strings Ckn+k−1 multisets 3.2 Infinite Outcomes Infinite outcomes can be generated from an experiment that can potentially be repeated infinitely many times. Therefore, the experiment itself should be repeatable. Example: Throwing a coin, rolling a dice and calling a call center are experiments that can be repeated. Each time they are performed, they generate outcomes: {Head, Tail } for throwing a coin, {1, 2, 3, 4, 5, 6} for rolling a dice, {Busy, Available} for calling a call center. If we perform these experiments independently m times, the outcomes become { H, T }m , {1, 2, 3, 4, 5, 6}m , { B, A}m . Here superscript m denotes the Cartesian product applied m times, e.g., { H, T }2 := { H, T } × { H, T }. ⋄ (Nearly-) Infinite outcomes need to be considered when we repeat an experiment until something (un-) desirable happens. If we are waiting for heads in a coin tossing experiment and recording the outcomes, we can see arbitrarily long sequences TT . . . TTH. We can throw two dices simultaneously and wait until their sum turns out to be 7, then again arbitrarily long sequences of sums can be observed 6, 8, 12, 9, 9, 4, 3, . . . , 8, 10, 8, 11, 2, 7. Or a hacker can attempt to find a password of length 4 made out of digits {0, 1, . . . , 9} and keep attempting different 4-permutations of these 10 digits: 2012, 7634, 1803, etc. The sample space for the k-long permutations (passwords) made with n objects has nk elements. If the hacker is attempting random permutations to find a password, he may have to this infinitely many times. If he is enumerating all the permutations and testing each one by one, he need to do this only 10,000 times. A set is countable if each of its elements can be associated with a single natural number. The sample space for the experiment of waiting for heads has infinite elements but it is countable. This sample space has T, HT, HHT, HHHT, and so on. Associating the number of H’s in the outcome with exactly the same natural number (including 0), we can see that the sample space is countable. The sample space for rolling two dices until obtaining the sum of 7 is also countable. The appendix has more on countability of sets and shows that rational numbers are countable while real numbers are not. Example: In the experiment of throwing a coin m times, let xi = 1 if the ith throw turns out to be head; otherwise, xi = 0. After the mth experiment, we can compute the frequency of head X (m) := ∑im=1 xi /m. We have 0 ≤ X (m) ≤ 1. It is easy to see that the sample space for the frequency of head is Ω X (m) := {0 = 0/m, 1/m, 2/m, . . . , m/m = 1}. Ω X (m) is a countable set, as any subset of rational numbers is countable. Can 6 we then say that the sample space for this experiment is the interval [0, 1]? Asked differently, as we increase m, does Ω X (m) contain every element of [0, 1]? Note that [0, 1] is an interval over real numbers, which include both rational and irrational numbers, so this interval is not countable. If the answer is yes, we can take √ an irrational number, say 2/2 ∈ [0, 1], and this number must be in Ω X (m) . But the elements of Ω X (m) are only rational numbers. Hence, the answer is no, the sample space of Ω X (m) does not become [0, 1] for any m. ⋄ The last example shows that repeating an experiment many times does not make its sample space uncountable. On the other hand, a single experiment without any repetition can yield an uncountable sample space. Because of these, the case of uncountable sample space deserves a separate discussion. 4 Uncountable Outcomes Outcomes that take values over a continuum are uncountable. Formally speaking, such outcomes are in an interval of real numbers ℜ. This brings up a philosophical question that is what, if any, quantity in nature takes really continuous values. That is, is there a quantity which must be measured in continuous amounts? Many attempts to find such a quantity turns out to be futile, once we consider enough details. Example: The amount of oxygen molecules in a room can be said to be a certain number of liters. This number can be reported by an environmental engineer as if it is continuous, hence taking values in ℜ. But a chemist may attempt to count the number of oxygen molecules and report only a natural number from N . The amount of energy obtained from splitting a radioactive isotope can also be reported to be in ℜ by a nuclear reactor operator while it can also be argued to be in N by a quantum physicist. You can continue this exercise and see if you can find an amount that requires continuous measurements. You can consider the number of shipments made by Amazon, ratio of shipments made to Texas; number of patients arriving at a hospital, ratio of underage patients arriving at that hospital, etc. ⋄ It appears that the nature has quantities that can be measured by rational numbers rather than real numbers, i.e., the nature does not require continuous measures. The next question is whether we create continuous measures in basic sciences or social sciences. One of the social sciences that deals with measurement and reporting of activities is accounting. Accounting does not seem to create amounts that require continuous measures. Example: The monetary values reported by accounting systems are numbers with at most two decimal digits, so these numbers are rational. Accounting systems also compute Key Performance Indicators (KPIs) by taking a ratio of two rational numbers. For example Rate of Return (ROI) of an investment is the annual return made by the investment divided by the amount of investment. Since both the numerator and the denominator are rational in these KPI computations, the ratio is also rational. ⋄ We can also consider the prices from the standpoint of Finance, demand from the standpoint of Marketing, personnel characteristics from the standpoint of Organizational Behavior, and conclude that we can use only natural or rational numbers in our analyses. However, you should also realize that many models in these disciplines are based on variables that take continuous values from an interval of real numbers. It appears that when we switch from observing what is happening to the analysis of what will happen, we tend to use continuous values. The reason behind this can be speculated to be the ease of analysis. We can take the point of view that continuous values are invented by the analysts for the purpose of ease of analysis. Example: Time is one of the oldest inventions of the human and is often considered to take continuous values. This is the reason why time is an uncountable noun in English when it refers to amount as in the sentence: “I spend too much time to understand the difference between countable and uncountable outcomes”. 7 It can be mathematically more convenient to build models that take time as a continuous value. ⋄ When an outcome (a variable) takes countable values, we call it a discrete variable; otherwise it is a continuous variable. Note that discrete outcomes can be finite or infinite; the distinction between discrete vs. continuous is based on countability. Discrete variables can approximate continuous variables fairly well. For example, every real number can be approximated by a rational number at any desired accuracy. This is known as the fact that rational numbers are everywhere dense in real numbers. Example: Is Ω X (m) := {0/m, 1/m, 2/m, . . . , m/m} dense in the rational numbers in the interval [0, 1]? For a desired accuracy level ϵ and an arbitrary rational q ∈ [0, 1], we can fix m such that supω ∈ΩX(m) |ω − q| ≤ ϵ. As a matter of fact, m = 1/ϵ suffices for every rational q. Therefore, Ω X (m) is dense in the rational numbers in the interval [0, 1]. ⋄ Since discrete and continuous variables approximate each other well, we can justify using continuous variables instead of discrete ones when the underlying quantity is in fact discrete. Continuous variables can also be appealing because they are easier to communicate and fit well with the practice of defining variables over a range. Example: Demand forecasters in practice often talk about ranges for the demand values. They say the demand is going to be between a and b, or over the range [ a, b] of real numbers, although the demand is actually a natural number in this range. They also say that the Texan demand is certain percentage of the national demand and this percentage takes values in [ a, b] for 0 ≤ a ≤ b ≤ 1, although the percentage is actually a rational number in this range. ⋄ When dealing with uncountable outcomes (continuous variables), we often come across sample spaces of the form Ω = {ω : a ≤ ω ≤ b} = [ a, b]. When there are m continuous variables, we may have Ω = [ a1 , b1 ] × · · · × [ am , bm ]. The same variable can be continuous over a range and be discrete afterwards. Such a mixture can indicate an assumption, a need to focus on some particular observations or methodology used in data collection. Example: An employee can quit a job within the first year of starting or afterwards. If the quitting happens in the first year, it is reported in terms of fractions of a year; otherwise, it is reported as multiples of a year. The historical tenure data then belongs to [0, 1] ∪ {2, 3, 4 . . . }. Note that the data for the first year is more accurate than the other years. Such increased accuracy within the first year may be required by the human resources department to accurately understand what triggers premature quitting. Hence, the employee tenure is both continuous (uncountable) over [0, 1] and discrete (countable) over {2, 3, . . . }. ⋄ 5 Probability Measure Up to now, we have defined experiments, events and sample spaces. Most of the probability theory is about computing the probability of an event; the probability of event A is denoted by P( A). Viewing P as a mapping from the sample space Ω to nonnegative real numbers ℜ+ , we can say that P measures the size of set A ∈ Ω. Although we focus on probability measures in this section, the concept of measure is more general. A general measure µ : Ω → [0, ∞) satisfies µ(∅) = 0 and the countable additivity condition that µ(∪i∞=1 Ai ) = ∑i∞=1 µ( Ai ) for each disjoint sequence of sets A1 , A2 , . . . . A probability measure in addition is required to satisfy P(Ω) = 1. This section addresses the issue of measuring a set first from a countable sample space and then from an uncountable sample space. 8 5.1 Countable Sample Spaces Countable sample spaces can be finite. Then we can list P({ω }) for each ω ∈ Ω. Example: For the experiment of tossing a fair coin, P({ H }) = 1/2 and P({ T }) = 1/2. For the experiment of rolling a fair dice, P({i }) = 1/6 for i = 1 . . . 6. ⋄ When no ambiguity happens, as in the above example, we can drop curly brackets and write P(ω ) as opposed to P({ω }), e.g., P( H ) = 1/2. The probability of event A, or the probability measure of set A, is P( A ) = ∑ ω∈ A P( ω ). When the sample space is finite, so is A and the sum above is a sum of finite terms. Then the probability of an event can be found by summing up the probability of outcomes making up the event. Example: What is the probability that sum of the rolls on two dices is 7? The numbers on rolls can be considered as pairs. To sum up to 7, these pairs must be (1,6), (2,5), (3,4), (4,3), (5,2), (6,1). There are six elementary outcomes summing up to 7 out of 36 elementary outcomes. Hence, P(Sum of the numbers is 7) = P((1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)) = 6/36. ⋄ Example: A box contains 4 balls: 2 Black and 2 White. Two balls are removed from the box without replacement. Let the sample space be ordered pairs indicating the ball colors. Then Ω = {bb, wb, bw, ww}. We can check that P(bb) = P(ww) = (2/4)(1/3) = 1/6 and P(wb) = P(bw) = (2/4)(2/3) = 2/6. Let us define events A, B, C as A = {a white ball is chosen}, B = {a black ball is chosen} and C = {two chosen balls are of different color}. P( A) = 1 − P(bb) = 1 − 1/6 = 5/6, P( B) = 1 − P(ww) = 1 − 1/6 = 5/6 and P(C ) = P(wb) + P(bw) = 4/6. ⋄ For an event A ∈ Ω, P( A) can have frequentist and behavioral interpretations. P( A) can be thought as the relative frequency of event A known from the history of observing the experiment in the past. P( A) can be interpreted also as the fair price of a bet that pays $1 if event A happens and $0 otherwise. For example, if A = { Head f irst, Tail a f terwards} in tossing a coin twice, the fair price is $0.25. Countable sample spaces can be infinite but even then we always have Ω = ∪i∞=1 {wi } and so A = ∪i∞=1 ({wi } ∩ A) . The countable additivity of the probability measure immediately implies P( A ) = ∞ ∞ i =1 i =1 ωi ∈ A ∑ P ({wi } ∩ A) = ∑ P ( wi ) . When the outcomes are finite, we can express the probability of each outcome explicitly. That is, we can write P(ω ) for every ω ∈ Ω. When the outcomes are infinite but countable, we can still write expressions for each P(ω ). Example: A fair coin is tossed until H appears. The sample space has outcomes such as H, TH, TTH, TTTH . . . . In general, an outcome has k = 1, 2, 3, . . . tosses, first k − 1 of them are T and the last one is H. Let An be the event of stopping in at most n tosses. A1 = { H } and P( A1 ) = P( H ) = 1/2. A2 = { H, TH } and P( A2 ) = P( H ) + P( TH ) = 1/2 + 1/4 = 3/4. A3 = { H, TH, TTH } and P( A3 ) = P( H ) + P( TH ) + P( TTH ) = 1/2 + 1/4 + 1/8 = 7/8. 9 Let Bn be the event of requiring n + 1 or more tosses until H appears. Clearly, B0 = Ω. And B1 = { TH, TTH, . . . , T . . . TH, . . . }, ∞ P( B1 ) = P( TH, TTH, . . . , T . . . TH, . . . ) = ∑ P(First k tosses are T and the k + 1st is H ) k =1 ∞ ∞ k =1 k =0 ∑ (1/2)k (1/2) = (1/4) ∑ (1/2)k = (1/4)(1/(1 − 1/2)) = 1/2. = Also ∞ P( B2 ) = P( TTH, TTTH, . . . , TT . . . TH, . . . ) = ∑ P(First k tosses are T and the k + 1st is H ) k =2 = ∞ ∞ k =2 k =0 ∑ (1/2)k (1/2) = (1/8) ∑ (1/2)k = (1/8)(1/(1 − 1/2)) = 1/4. And in general ∞ P( Bn ) = = ∑ P(First k tosses are T and the k + 1st is H ) k=n ∞ = ∑ (1/2)k (1/2) = (1/2)n+1 k=n ∞ ∑ (1/2)k = (1/2)n+1 (1/(1 − 1/2)) = (1/2)n . k =0 The probability of requiring at least n + 1 tosses until H appears is P( Bn ) = (1/2)n . Once more, we have written each P(ω ) in the sums above. It is also worth noting that A1 ⊂ A2 ⊂ · · · ⊂ An and B1 ⊃ B2 ⊃ · · · ⊃ Bn . { An } is an increasing sequence of sets and its limit is limn→∞ An = Ω, then we can set A∞ = Ω. { Bn } is a decreasing sequence of sets and its limit is limn→∞ Bn = ∅, then we can set B∞ = ∅. You can check that ∞ A∞ = ∪∞ n=1 Ai and B∞ = ∩n=1 Bi . Furthermore, An ∪ Bn = Ω and An , Bn are disjoint for each n = 1, 2, . . . . ⋄ 5.2 Uncountable Sample Spaces For uncountable Ω, the tactic of writing each P(ω ) in Ω runs into a difficulty. For example, we cannot list the outcomes that make up the uncountable sample space [0, 1]. We do not know where to start and go next for such a list. If the outcomes in [0, 1] are equally likely and we attach a positive probability to each outcome, the sum of of probabilities will be more than 1. If we attach positive probability to only countable outcomes, then the sample space essentially becomes countable. Unless otherwise stated, [ a, b] always denotes an interval of real numbers. If we cannot attach a probability to each outcome of an uncountable sample space, what can we do? That is, we want to attach probabilities to some subsets of Ω such that we can compute probabilities for all the other sets of interest. These subsets do not have to be elementary outcomes, they can be sets including more than one outcome. In other words, we want to measure every set in Ω. Disappointingly, every set is not measurable with every measure. Example: Lebesgue measure cannot measure a specially constructed set. When its domain is restricted to [0, 1], Lebesgue measure defined as µl ( A) := (b − a) for the interval A = [ a, b] is a probability measure. To 10 construct the special set E, we consider grouping of two real numbers x, y in the same their difference √ √ class if √ √ x − y is rational. For example, real number 2 is grouped in the same class with 1 + 2, 2.1 + 2, −0.9 + 2, √ √ √ √ etc. Real number 3 is grouped in the same class with 1 + 3, 2.1 + 3, −0.9 + 3, etc. Membership in a class can be considered as a relation, which turns out to be reflexive, symmetric and transitive. Hence, this relation yields equivalence classes defined over real numbers. Each equivalence class can be called Er where r is a real number and Er = r + {Rational Numbers} for r ∈ ℜ. The equivalence classes are disjoint Er1 ∩ Er2 = ∅ for r1 ̸= r2 . The new set E is assembled by a picking a single element from Er ∩ [0, 1] for r ∈ ℜ. The resulting set E is uncountable and subset of [0, 1]. Attempts to obtain µl ( E) results in contradictions with either countable additivity or nonnegativity of µl . The details of this contradiction is outside the scope but can be found on pp.27-28 of Cohn (2013). Attaching significant importance to this construction, Cohn provides it as Theorem 1.4.9. Gelbaum and Olmsted (2003) discusses this issue in §8.11 titled as A nonmeasurable set. ⋄ 5.2.1 Sigma-field When every set is not measurable, the alternative is to measure a collection of sets. We let F denote a collection of events (subsets of Ω) and start wondering what properties F need to satisfy to be useful in the probability context. Intuitively, we want to able to consider unions and intersections of events, and assess their probabilities or assign probabilities to them. To formalize this, we want to attach a probability to A ∪ B, if we have done so for A and B. So we would like to include A ∪ B in F if A, B ∈ F . Then we can assess the probability for the event where either A or B happens. Is including the unions in the collection F sufficient? It is not for the purpose of assessing the probability of A ∩ B, or the event that both A and B happen. An indirect way to include A ∩ B in F is to require that both the union A ∪ B and the complement Ac are in F if A, B ∈ F . This is because A ∩ B = ( Ac ∪ Bc )c ∈ F if A, B ∈ F . Requiring A ∪ B ∈ F and Ac ∈ F makes F closed under set difference operations: Difference A \ B = A ∩ Bc ∈ F and symmetric difference A△ B = ( A \ B) ∪ ( B \ A) ∈ F . If we stop here and require A ∪ B ∈ F , Ac ∈ F and Ω ∈ F , then the collection F is called a field. By using A ∪ B ∈ F several times in an induction argument, we can also obtain that a finite number of unions are also in F : ∪in=1 Ai ∈ F if Ai ∈ F for i = 1 . . . n. Unfortunately, this does not suffice for our purposes of being able to consider the probability associated with infinitely many events (say probability of getting H on an odd throw). Hence, we require a stronger condition that the countable number of unions are in F when constructing probability models: ∪i∞=1 Ai ∈ F if Ai ∈ F for i = 1, 2, . . . . If a field is closed under countable unions, it is called a sigma-field, denoted by σ-field. In summary, we obtain the following three conditions that define a σ-field. F is a σ-field if F i) Includes the sample space: Ω ∈ F . ii) Closed under complement: Ac ∈ F if A ∈ F . iii) Closed under countable unions: ∪i∞=1 Ai ∈ F if Ai ∈ F for i = 1, 2, . . . . Note that ii) and iii) imply that the σ-field is closed under countable intersections. If Ω is finite, then any field over Ω is also a σ-field. Example: For Ω = {1, 2, 3, 4}, one of the σ-fields is F = {∅, Ω, {1, 2}, {3, 4}}. Another σ-field is F = {∅, Ω, {1, 3}, {2, 4}}. For Ω = N , F = {∅, Ω, odd natural numbers, even natural numbers} is a σ-field. ⋄ Sometimes a given collection of subsets of Ω is not a σ-field, but it can be turned into a σ-field by adding more subsets to it. Addition of subsets to the collection may continue until all of the subsets of Ω are included. All of the subsets of Ω is the largest σ-field over Ω. Example: For Ω = {1, 2, 3, 4}, {∅, Ω, {1, 2}, {2}, {3, 4}} is not a σ-field because it does not include {2}c or the union {2} ∪ {3, 4}. We can add these to the collection to obtain {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}}, 11 which is not a σ-field because it does not include the complement {2, 3, 4}c . We can add this to the collection to obtain {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1}}, which is a σ-field. We can say that the σ-field {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1}} is generated by {∅, Ω, {1, 2}, {2}, {3, 4}}. ⋄ The σ-field generated by collection C is the smallest σ-field that includes C . We write σ (C) to refer to the σ-field generated by C . By definition σ(C) = ∩{F σ-filed including C}. The examples above are based on finite sample spaces, we can define σ-fields over uncountable sample spaces. One of the most used σ-fields is the Borel field R over ℜ. Borel field R is generated by open intervals in ℜ: R = σ({( a, b) : −∞ < a ≤ b < ∞}). So the Borel field contains all the open intervals, their countable unions as well as complements. By using [ a, b] = ∩∞ n=1 ( a − 1/n, b + 1/n ), we can see that closed intervals of ℜ also generate the Borel field. Example: Each rational number q can be written as an intersection of countable open intervals: {q} = ∩∞ n=1 ( q − 1/n, q + 1/n ). So each singleton including a rational is in the Borel field, as well as their countable union, which is the set of rational numbers. Since the set of rational numbers are in the Borel field, so is its complement – the set of irrational numbers. ⋄ Pairing the sample space Ω with a σ-field F defined over it, we obtain (Ω, F ), which is called a measurable space. Measurable spaces are used to define measurable functions. A function ξ : Ω → ℜ is called F measurable function if {ω ∈ Ω : a ≤ ξ (ω ) ≤ b} ∈ F for each a, b ∈ F . Example: Over Ω = {1, 2, 3, 4}, consider the σ-field F = {∅, Ω, {1}, {2}, {1, 2}, {3, 4}, {1, 3, 4}, {2, 3, 4}}. Let us check if ξ 1 (ω ) = ω for ω ∈ Ω is F measurable. Since {ω ∈ Ω : 3 ≤ ξ 1 (ω ) ≤ 3} = {3} ∈ / F, ξ is not measurable. Let us check if ξ 2 (ω ) = ω for ω ∈ {1, 2, 3} and ξ 2 (4) = 3 is F measurable. This time {ω ∈ Ω : 3 ≤ ξ 2 (ω ) ≤ 3} = {3, 4} ∈ F . Moreover, {ω ∈ Ω : 1 ≤ ξ 2 (ω ) ≤ 1} = {1} ∈ F , {ω ∈ Ω : 2 ≤ ξ 2 (ω ) ≤ 2} = {2} ∈ F and {ω ∈ Ω : 4 ≤ ξ 2 (ω ) ≤ 4} = ∅ ∈ F . Furthermore, {ω ∈ Ω : 1 ≤ ξ 2 (ω ) ≤ 2} = {1, 2} ∈ F , {ω ∈ Ω : 2 ≤ ξ 2 (ω ) ≤ 3} = {2, 3, 4} ∈ F and {ω ∈ Ω : 1 ≤ ξ 2 (ω ) ≤ 3} = {1, 2, 3, 4} ∈ F . So ξ 2 is F measurable. ⋄ 5.2.2 Probability Space: Sample Space, Sigma-field, Probability Measure To obtain a probability space from the measurable space (Ω, F ), we need to define probability measure P such that i) Measure the sets in F : That is, P : F → [0, 1], i.e., for every A ∈ F there exists a real number P( A) ∈ [0, 1]. ii) Countable additivity: P(∪i∞=1 Ai ) = ∑i∞=1 P( Ai ) for disjoint A1 , A2 , . . . . iii) Sample space always happens: P(Ω) = 1. These three properties can also be called axioms of probability. It is easy to justify them when P(·) is interpreted as frequency. Note that ii) applies only to countable collection of sets Ai , it does not necessarily apply when the collection is uncountable. A probability space is a triplet (Ω, F , P) made from the sample space Ω, σ-field F and the probability measure P. Example: Consider the Borel field R[1, 11] defined over the interval Ω = [1, 11] and the function f : R[1, 11] → [0, 1] defined as A ∈ R,)we(partition it into ) open and closed ) each ( ( f ([ a, b]) =) (b(− a)/10. For intervals as follows A = ∪i∞=1 [ a1i , bi1 ] ∪ ∪i∞=1 ( a2i , bi2 ] ∪ ∪i∞=1 [ a3i , bi3 ) ∪ ∪i∞=1 ( a4i , bi4 )) . Such a countable partition is possible because the Borel field includes only countable unions. Then f ( A) = f (∪i∞=1 [ a1i , bi1 ] ∪ ∪i∞=1 ( a2i , bi2 ] ∪ ∪i∞=1 [ a3i , bi3 ) ∪ ∪i∞=1 ( a4i , bi4 ])). Now we can check that f satisfies conditions i), ii) and iii) to be a probability measure and makes ([1, 11], R[1, 11], f ) a probability space. ⋄ 12 In ℜ with the Borel field R, the length of the interval is called the Lebesgue measure. This can be extended to higher dimensions and the Lebesgue measure becomes the area of a set in ℜ2 and the volume of a set in ℜ3 . If we take a rock with a volume of 1 liter and break it into smaller pieces, then the total volume of some certain pieces in a collection is the sum of the volume of each piece in that certain collection. This sounds quite trivial, so should its analog ii) in the probability theory. Why countable additivity as opposed to finite additivity? Let us consider the experiment of tossing a fair coin until the head shows up. We want to compute the probability that the number tosses, say k, is an odd number. Since k is an odd number, we can write it as k = 2n − 1 for n = 1, 2, . . . . If k = 1, then the outcome is H, n = 1 and with probability 1/2. If k = 3, the outcome is TTH, n = 2 with probability (1/2)3 . For a generic n, the outcome say ωn has 2(n − 1) H and 1 T with probability is (1/2)2n−1 . Now we need to ∞ compute P(∪∞ n=1 { ωn }), which becomes ∑n=1 P( ωn ) by countable additivity. If we do not have countable ∞ additivity but just finite additivity, we cannot write P(∪∞ n=1 { ωn }) = ∑n=1 P( ωn ). Justifying this equality via ∞ 2n−1 = (1/2) ∞ (1/4)n = countable additivity what remains is to evaluate ∑n=1 P(ωn ), which is ∑∞ ∑ n =0 n=1 (1/2) (1/2)(4/3) = 2/3. As an exercise you can also compute the probability for even number of tosses. As this example illustrates, finite additivity can be insufficient. Why countable additivity as opposed to uncountable additivity? Expression of uncountable additivity can be P(∪ω ∈ A {ω }) = ∑ω ∈ A P(ω ) for the uncountable set A ∈ Ω. To test this equality, consider the probability space (Ω = [0, 1], R[0, 1], µl ) where R[0, 1] is the Borel field on [0, 1] and µl is the Lebesgue probability measure P([ a, b]) = b − a for 0 ≤ a ≤ b ≤ 1. So P(ω ) = P([ω, ω ]) = 0 for 0 ≤ ω ≤ 1. The left-hand side of the assumed uncountable additivity equation yields P(∪ω ∈ A {ω }) = 1 for A = Ω. The right-hand side of the same equation yields ∑ω ∈ A P(ω ) = ∑ω ∈ A 0 = 0. Assuming uncountable additivity leads us to a contradiction of 1 = 0, which is incorrect. Hence, uncountable additivity can be incorrect. 6 Exercises 1. 2 cards are selected from a deck of 52 cards. a) How many ways are there to select 2 cards? b) How many ways are there to choose so that one of the cards is an ace and the other is either king, queen or jack? 2. How many signals – each consisting of 9 flags hung in a line – can be made from a set of 4 white flags, 3 red flags and 2 blue flags if all flags of the same color are identical? We can think of two ways to answer this question. a) There are 9! different orderings of 9 distinct flags. Since the white flags are identical, we must divide 9! by 4!. Apply the same logic for red and white flags. b) There are 9 positions on a signal, 4 of these must be assigned to white, 3 to red and 2 to blue. In other words, we are making up 3 subsets, one for white, one for blue and one for red where kW = 4, k R = 3 and k B = 2 while kW + k R + k B = 9. 3. Consider a set of balls, 5 of which are red and 3 of which are yellow. Assume that all of the red balls and all of the yellow balls are indistinguishable. How many ways are there to line up the balls so that no two yellow balls are next to each other? 4. 4 musicians make up a chamber orchestra to play cello, violin, flute and piano. a) If each musician can play all of the four instruments, how many orchestral arrangements are possible? ANSWER 4! b) If each musician can play all of the four instruments except for one who can play only 2 instruments, how many orchestral arrangements are possible? ANSWER (2) 3! 13 5. UT Dallas WalMart Supply Chain case competition team of 8 people are to return from Arkansas to Dallas with two cars. If each of the two cars can take at most 5 people, how many ways the team members can be distributed to these two cars? 6. How many ways are there to distribute a deck of 52 cards to 13 players so that each player has exactly 4 cards and each of these 4 cards come from a different suit (spades, hearts, diamonds, clubs)? 7. Given natural numbers {1, 2, . . . , n}, let π be a permutation of them: π (i ) = j means that number i is in position j. Let Π be the set of all permutations. a) Suppose that n = 4 and consider the permutation 2 1 3 4, what is the associated π (1), π (2), π (3), π (4)? b) Consider n = 4. How many permutations are there with the property π (1) ̸= 1? How many permutations are there with the property π (1) = 1 and π (2) ̸= 2? c) Define the set Πk of permutations as follows Πk = {π : π (i ) = i for 1 ≤ i ≤ k and π (k + 1) ̸= k + 1} for 0 ≤ k ≤ n − 1 and Πn = {π : π (i ) = i for 1 ≤ i ≤ n}. Check to see if {Πk }nk=0 partitions the set Π: i) Πk ∩ Πm = ∅ for 1 ≤ k < m ≤ n and ii) ∪nk=0 Πk = Π. d) Use parts above to prove n −1 n! = ∑ (i)i! + 1. i =0 8. In how many ways one can place seven indistinguishable balls in four distinct boxes with no box left empty? 9. How many non-negative integer solutions are there for x1 + x2 + · · · + xn = b for integer b ≥ 0? Express the number of non-negative integer solution to x1 + x2 + · · · + xn ≤ b in terms of n and b. This will give you an idea about the cardinality of feasible sets in integer programs. 10. How many positive integer solutions are there for x1 + x2 + x3 = 4? 11. a) How many different paths a rook (which moves only horizontally and vertically) can move from the southwest corner of a chessboard to the northeast corner without ever moving to the west or south? We are interested in paths not in the specific moves of the rook. Thus, we can assume that the rook makes 14 moves: 7 to the East and 7 to the North. b) How many of the paths consist of four or more consecutive eastward moves? ANSWER a) Each path is a permutation of 7 E and 7 N. The number of paths is (14!)/(7!7!). b) Let i be the starting position of the string of 4 or more consecutive Es in any permutation of 7 Es and 7 Ns with at least 4 Es. If i = 1, the first 4 moves are Es, the remaining 10 moves can be 3 Es and 7Ns. There are 10!/(7!3!) paths with i = 1. If i = 2 . . . 11, the i − 1st position must be N. Otherwise, the starting position of the string is not i but i − 1. Since i − 1st position is N, there remains 6 Ns and 3 Es which can be permuted in (9!)/(6!3!). The total number of paths is 10!/(7!3!)+ 10 (9!)/(6!3!)=120+840=960. An incorrect approach is to treat NEEEEENNNNNNEE as two different sequences obtained by NE(4E)-NNNNNNEE and by N-(4E)-ENNNNNNEE. This double counting happens when you consider 4Es as a unit and the remaining 3Es and 7Ns separately to conclude that the answer is 11!/(1! 3! 7!)=1320. 12. A board (table) has M + 1 columns and N + 1 rows. A piece is located at cell (1, 1) and will move to cell ( M + 1, N + 1) either by moving up 1 cell or by moving right 1 cell. a) How many moves are necessary to go from (1, 1) to ( M + 1, N + 1)? 14 b) How many distinct paths do exist from (1, 1) to ( M + 1, N + 1)? c) How many distinct non-decreasing integer-valued functions can be defined over the domain of integers { a, a + 1, . . . , a + M } and range of integers {b, b + 1, . . . , b + N } such that the functions go through ( a, b) and ( a + M, b + N )? d) How many distinct non-decreasing integer-valued functions can be defined over the domain of integers { a, a + 1, . . . , a + M } and range of integers {b, b + 1, . . . , b + N } such that the functions go through the point ( a, b) and between the points ( a + M, b) and ( a + M, b + N )? 15 Appendix: Countability of Rationals and Uncountability of Reals 1 Here are two questions to consider. Are there the same number of integers as natural numbers? Are there the same number of rational numbers as natural numbers? The idea is that there can be infinite sets that do not have the same size. To make sense of that statement, we have to know what it means to say that two sets have the same size. This really goes back to our ideas of what it means to count the elements of a set. When I look out of my window to a field of sheep, I count the sheep by matching each sheep to a number: 1, 2, 3, 4, . . . . And if I count the books in the bookcase, I do the same thing: I match each book to a number: 1, 2, 3, 4, . . . . I would say there are 10 sheep (or books) if I can match each sheep (book) to a number from 1 to 10 in such a way that each number gets used, and no two sheep get the same number. We say that the set of sheep in this field has the same size as the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, because we can match sheep to numbers. When would we say there are more than 10 sheep? If we match the sheep to the numbers 1 to 10 and still have some sheep left over, there must be more than 10 sheep. If we match the sheep to the numbers 1 to 10 and still have numbers left over, there must be fewer than 10 sheep. We use much the same idea with infinite sets. An infinite set is one that cannot be matched to any finite set. We use the natural numbers as our counting set, and try to match numbers to the naturals. If we can do that, we say that the set is countable; if we cannot, we say that the set is uncountable. There is a slight ambiguity about whether a finite set is countable. For our purposes, finite sets are countable. Here is another way to think about countability. A countable set can be listed: we write the element matched to 1 first, the element matched to 2 second, and so on. And if we are given a list, we can get a matching (the first element gets matched to 1, the second to 2, and so on). Sometimes that can be a useful way of thinking about things. Let us try to think of some examples of countable sets. The natural numbers are countable, because I can match each natural n to itself. Let us try to think of some more interesting examples! Are the integers countable? Can we write them in a list? We might write them as . . . , -3, -2, -1, 0, 1, 2, 3, . . . , but that does not count because the list does not have a first element. So we need another strategy. We need to be sure that we list everything and that we do not list anything twice. Here is one possibility: 0, 1, -1, 2, -2, 3, -3, . . . . We “start in the middle and work outwards”. So yes, the integers are countable. Are the rationals countable? Can we list the rationals? Yes. The rational numbers are countable. Let us see how to prove this. One way of thinking about our proof that the integers are countable is that we wrote them in a line (. . . , -3, -2, -1, 0, 1, 2, 3, . . . ) and then drew a path that took us through them all. You can see this by drawing it for yourself. If we could do something similar for the rationals, that would be great. But it is not quite as obvious how to write them in a line. In fact, thinking about it for a bit it seems more natural to write them in a grid: q 1 2 3 4 5 ... 1 1/1 → 1/2 ↓ 1/3 ↗ 1/4 ↓ 1/5 ↗ ... 2 2/1 ↙ 2/2 ↗ 2/3 ↙ 2/4 ↗ 2/5 ... p 3 3/1 → 3/2 ↙ 3/3 ↗ 3/4 3/5 ... 4 4/1 ↙ 4/2 ↗ 4/3 4/4 4/5 ... 5 5/1 → 5/2 5/3 5/4 5/5 ... ... ... ... ... ... ... ..., where the rational p/q is written in the pth column and qth row. Now, how can we plot a path through these? We can imagine working through each diagonal in turn. So we might get something like 1/1, 2/1, 1/2, 1/3, 2/2, 3/1, 4/1, 3/2, 2/3, 1/4, 1/5, . . . . This is not quite 1 Based on posts on http://theoremoftheweek.wordpress.com 16 allowed, because we have counted some rationals twice (e.g., 2/2 = 1/1). But that is easily fixed: we say that we will follow this path, simply missing out anything we have seen before. This gives us a listing of the positive rationals. The real numbers are uncountable. This means there is no way of listing the real numbers. So our aim is to prove that it is impossible to write the real numbers in a list. How could we possibly do that? We are going to suppose that it is possible to list the real numbers. Then we will somehow derive a contradiction from that, which will mean our original supposition must have been wrong. So, we are supposing for the moment that we can list the real numbers. In that case, we can certainly list the real numbers over [0, 1]. Let us imagine that we have done this, and we have written them all out in order, using their decimal expansions. Slightly annoyingly, some numbers have two expansions, since 0.99999999999999 = 1, but let us say we always write the finite version rather than the one ending in infinitely many 9s. So they look something like 0.a1,1 a1,2 a1,3 a1,4 a1,5 . . . 0.a2,1 a2,2 a2,3 a2,4 a2,5 . . . 0.a3,1 a3,2 a3,3 a3,4 a3,5 . . . 0.a4,1 a4,2 a4,3 a4,4 a4,5 . . . 0.a5,1 a5,2 a5,3 a5,4 a5,5 . . . ai,j is the jth decimal in the ith real number. To derive a contradiction, we are going to build another real number between 0 and 1, one that is not on our list. Since our list was supposed to contain all such real numbers, that will be a contradiction, and we will be done. So let us think about how to build another real number between 0 and 1 in such a way that we can be sure it is not on our list. Let us say this new number will be 0.b1 b2 b3 b4 b5 . . . , where we are about to define the digits bi . We want to make sure that our new number is not the same as the first number on our list. So let us do that by making sure they have different numbers in the first decimal place. Say if a1,1 = 3 then b1 = 7 and otherwise b1 = 3. I really mean: define b1 to be any digit apart from a1,1 , but I want to make sure that we do not get a number that ends in infinitely many 9s, because of the irritating fact that 0.99999999999999 = 1, so I want to make sure we never choose b1 to be 9. Now we want to make sure that our new number is not the same as the second number in our list. We can do this by making sure that the second digit of our new number is not the same as the second digit of the second number. So let us put b2 = 7 if a2,2 = 3 and b2 = 3 otherwise. And so on. At each stage, we make sure that our new number is not the same as the nth number on the list, by making sure that bn is not the same as an,n . And that defines our new real number, one that is definitely not on our list because we built it that way. If we apply the above argument to prove that rational numbers are uncountable, where would the argument break? Hint: Rational numbers have repeating decimals while real numbers can have nonrepeating decimals. References D.L. Cohn. 2013. Measure Theory. 2nd edition published by Birkhäuser. B.R. Gelbaum and J.M.H. Olmsted. 2003. Counterexamples in Analysis. Published by Dover in 2003 and based on the 2nd edition published by Holden Day in 1965. 17

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download N - The University of Texas at Dallas