Non-Normalizable Probability Measures for Fun and Profit

Here’s a fun logic puzzle (see also here; originally found here). There’s a family resemblance to the Monty Hall problem, but the basic ideas are pretty distinct.

An eccentric benefactor holds two envelopes, and explains to you that they each contain money; one has two times as much cash as the other one. You are encouraged to open one, and you find $4,000 inside. Now your benefactor — who is a bit eccentric, remember — offers you a deal: you can either keep the $4,000, or you can trade for the other envelope. Which do you choose?

If you’re a tiny bit mathematically inclined, but don’t think too hard about it, it’s easy to jump to the conclusion that you should definitely switch. After all, there seems to be a 50% chance that the other envelope contains $2,000, and a 50% chance that it contains $8,000. So your expected value from switching is the average of what you will gain — ($2,000 + $8,000)/2 = $5,000 — minus the $4,000 you lose, for a net gain of $1,000. Pretty easy choice, right?

A moment’s reflection reveals a puzzle. The logic that convinces you to switch would have worked perfectly well no matter what had been in the first envelope you opened. But that original choice was complete arbitrary — you had an equal chance to choose either of the envelopes. So how could it always be right to switch after the choice was made, even though there is no Monty Hall figure who has given you new inside information?

Here’s where the non-normalizable measure comes in, as explained here and here. Think of it this way: imagine that we tweaked the setup by positing that one envelope had 100,000 times as much money as the other one. Then, upon opening the first one, you found $100,000 inside. Would you be tempted to switch?

I’m guessing you wouldn’t, for a simple reason: the two alternatives are that the other envelope contains $1 or $10,000,000,000, and they don’t seem equally likely. Eccentric or not, your benefactor is more likely to be risking one dollar as part of a crazy logic game than to be risking ten billion dollars. This seems like something of a extra-logical cop-out, but in fact it’s exactly the opposite; it takes the parameters of the problem very seriously.

The issue in this problem is that there couldn’t be a uniform distribution of probabilities for the amounts of money in the envelopes that stretches from zero to infinity. The total probability has to be normalized to one, which means that there can’t be an equal probability (no matter how small) for all possible initial values. Like it or not, you have to pick some initial probability distribution for how much money was in the envelopes — and if that distribution is finite (“normalizable”), you can extract yourself from the original puzzle.

We can make it more concrete. In the initial formulation of the problem, where one envelope has twice as much money as the other one, imagine that your assumed probability distribution is the following: it’s equally probable that the envelope with less money has any possible amount between $1 and $10,000. You see immediately that this changes the problem: namely, if you open the first envelope and find some amount between $10,001 and $20,000, you should absolutely not switch! Whereas, if you find $10,000 or less, there is a good argument for switching. But now it’s clear that you have indeed obtained new information by opening the first envelope; you can compare what was in that envelope to the assumed probability distribution. That particular probability distribution makes the point especially clear, but any well-defined choice will lead to a clear answer to the problem.

.

66 Comments

66 thoughts on “Non-Normalizable Probability Measures for Fun and Profit”

  1. The difference between theory and reality:

    If I flip a coin, it’s 50/50 on coming up Heads. If I flip it a second time, it’s still 50/50, presumably ad infinitum But if I flip Heads 1000 times in a row, would you still consider the 1001st flip a 50/50? Not me!

    The odds of the flips being honest and truly random and yielding 1000 consecutive Heads are a lot higher than the odds of it being crooked and non-random.

  2. A related problem:
    Suppose you have two cards with a random number on each. You have to turn over one card, observe the number and then judge whether it is larger than the hidden number on the second. Obviously you can win exactly half the time by always stopping with the first number, or always stopping with the second number, without even peeking.

    Claim: There exists a strategy with which you can win this game more than half the time.

    (Readers take comfort: When mathematicians first heard this claim, many of us found it implausible)
    solution in Ted Hills Optimal stopping article:
    http://www.tphill.net/publications/OPTIMAL%20STOPPING%20PAPERS/AmSci2009-03Hill.pdf

  3. My favorite version of this puzzle is a similar argument with two people considering the deal of switching how much money each one has in their wallet. Almost identical logic to this one suggests that they both have positive expected value. The problem is the same- the lack of a consistent uniform distribution.

  4. Whenever you follow the proper rules for statistical analysis and end up with a paradox you should say ‘The problem is poorly formulated’. There is no single correct answer to what is wrong with the problem. The Monty Hall problems is completely different because it can be formulated so that no paradoxes emerge.

  5. I took the view that if the opened envelope had $4000 cash, the other one could contain either:

    $2000 and a cheque for $1,000,000 (not cash you see, well you did say he was eccentric)

    or

    $8000

  6. Sean, your point about the probability distribution is well taken, but what drives me crazy in these examples are the assumptions (or maybe stipulations) that utility of the bet is well measured by its nominal payoff ($x) and this distribution increases linearly or some uniform fashion ($4000 is twice as good as $2000 and half as good for the agent as $8000). Except for trivial bets, this is almost NEVER the case with real agents. Suppose you are poor person whose quality of life could be significantly improved by $4000 windfall. Then you are absolutely crazy to risk this amount in a gamble that may cut your windfall in half. Or suppose you are another poor person who needs $7500 for critical medical treatment. Then $4000 won’t do it and he must gamble for enough money. Familiar stuff, I know, but if the ultimate point here is to assess the rationality of the bets real agents make, then we cannot model their utilities with some linear nominal payoff distribution.

  7. Strether,

    There are probability distributions (and the problem is not poorly stated), but the 2^(-n) distribution is irrelevant. They might as well flip a coin or roll a dice (biased if needed) to determine whether it’s going to be {$100, $1000} or {$10, $100} in the envelopes. Draw the tree and look at the cases where you find $100.

    Yes, uncertainty is not synonymous with randomness; a coin on my table does not have a “1/2 chance” of having heads up just because you can’t see it, imho. However, there’s an interpretation of probability called Subjective in philosophy where you are allowed to assign probabilities to things any way you want (as long as a Dutch book can’t be made against you, which keeps it still logically consistent).

  8. This is the best explanation of the two-envelope paradox that I’ve ever heard! I never realized that the “paradox” comes from the absence of an obvious choice of prior, and the convention that priors, in probability puzzles, are rarely stated explicitly.

    Has anyone here seen the two-envelope paradox used to illustrate the necessity of choosing a prior distribution (whether or not you realize you’re doing it), and the dependence of your predictions on the choice of prior?

  9. Congratulations on a very nice explanation, but this also a great example of why mathematical arguments are so unpersuasive to most people. ‘Should’ to a mathematician means that the expected value of a choice is greater, but personal satisfaction and expected value are not at all the same, nor is the human value of a sum of money linear in the dollar amount.

    Take the first case. If you’re a mathematics student who is four thousand dollars short of next semesters tuition, then the four thousand in hand is life changing. On the other hand, an additional four thousand would merely be nice to have. Gambling and ending up with two thousand might mean that you’ll be driving a cab instead of attending school.

    Even without a life-changing condition, two psychological factors argue for the opposite conclusion. The personal joy derived from windfalls tails off in something like a logarithmic curve. Finding a $100 bill only makes you a little happier than finding a $20. Moreover, winning and losing are not symmetrical for most people. Most people are more distressed by loss than they are pleased by gain.

  10. Thank you for pointing out this awesome example (of why speculating about initial conditions for the universe is meaningless)

  11. After all, there seems to be a 50% chance that the other envelope contains $2,000, and a 50% chance that it contains $8,000.

    This is the mistake in the problem. You’re treating the amount of money in the second envelope as a random variable, where it is not. This would be the logic if you are given an amount X, and given a choice to take 2x or x/2 based on a fair coin flip. You take the flip.

    If you open the envelope and find X, it is true that the other envelope contains either 2X or X/2, but there is not a 50% chance of it being either one; it IS one or the other. The concept of ‘expectation value’ does not apply to a non-random value.

    If you start the analysis calling the lesser value X, there being a 50% probability of having chosen X or 2X to begin with, then the expectation value of switching or not is identical.

  12. @efp, Maybe I’m heading off-track — but isn’t that also true in the Monty Hall problem? The prize IS behind one of the two curtains that Monty didn’t open. Yet the conventional solution holds that your original odds of winning the car were 1/3, and that after Monty opens a curtain, there is a 2/3 chance or expectation that the car is behind the curtain you didn’t pick. That seems to be treating the objectively non-random location of the car as a random variable — but it also seems correct??

  13. Pingback: A Couple of Posts From Cosmic Variance « College Math Teaching

  14. Hi Sean,

    I’d like to draw your attention to two bizarre little problems. One is Penney’s Game. I read about it back in the seventies in an issue of Scientific American and rediscovered it when I created a similar problem. It’s a game where you pick a sequence of heads and tails, and for any choice you make, your opponent can find a sequence of the same length that is more probable. It is a binary probabilistic version of Rocks, Paper, Scissors.

    The other involves infinite sums of distributions. That is, in comparing infinite sums of Gaussians and Cauchy Lorentz distributions, one arrives at a strange insight. In Euwald’s trick to deal with charge distributions, one uses a sum of identical equally spaced Gaussians (equally space the centers of the Gaussians along the x-axis). Way back when I studied this I was confounded by the statement that the value of the sum (not the area under the curve) is periodic in the finite case. For years I thought that this was approximate because of edge effects, but eventually I realized that even if you take the infinite case, the sum will be truly periodic and finite. Furthermore, it works in any finite number of dimensions. The max value of the sum function converges.

    However, the same is not true for all distributions. If you replace the Gaussians with Cauchy-Lorentz distributions, the sum now diverges for two or more dimensions (and maybe even for one dimension). I should be careful here to say that this doesn’t have anything to do with conditional convergence in the sense of alternating series. The series are all positive definite. Hence, one is left with the strange observation that convergence depends entirely on the shape of the function and not the amount of stuff under the curve. I wonder if this points to a flaw in risk management strategies employed in the investment business. For instance, the natural tendency would be to spread out risk in the market. This phenomena would be more like distributing according to a Lorentzian (a long tailed distribution) rather than a Gaussian, so if you model such risk with Gaussians you will significantly underestimate the actual risk. If other investors make the same mistake, the underestimates compound.

  15. @Strether, you’re not off-track; it is subtle. I had to think a while about how to answer. If I say: there are two envelopes with amounts X and 2X in them, my sample space is {X, 2X}. If I pick one with a coin flip, my expectation value is 3X/2. This is the amount my average will converge to if I repeat the process many times (of course, it is impossible to get 3X/2 on any single experiment). Suppose I pick one, and call the amount in it Y. Then either Y = X or Y = 2X. Buy my sample space is still {X, 2X}, because if Y = X then the other envelope MUST have 2X, and if Y = 2X then the other envelope MUST have X.

    It is correct that the other contains an amount 2Y or Y/2, and it’s even correct to say those possibilities are equally likely, but it is incorrect to use those probabilities to compute an expectation value, as the scenarios {Y, 2Y} and {Y, Y/2} represent seperate sample spaces. If the amount in the envelopes is fixed at the beginning (at 3X), it is only one of these (Y=X or Y=2X), and expectation values must be computed from one or the other (with differnt values of Y!), both of which will come to 3X/2.

    If the amount in the envelopes is not fixed at the beginning, then it’s the coin-flip game and you will win by switching. The ‘actuality’ of what’s in the envelopes (or behind the curtains) is established at the beginning of the game, when the sample space is defined. The trick to the paradox is fooling you into thinking the sample space is {2Y, Y/2}, which is only true if it is a stochastic process.

  16. I thought of a more concise way to put it: in order to compute an expectation value, you need to know what the sample space is. In this case, you need ot know the total amount in the envelopes.

    If you don’t know this, you can’t (correctly) compute an expectation value after opening one envelope, any more than you could before opening either envelope.

    If you do know this, and you open one envelope, you know what is in the other.

  17. @efp. Thanks! So it sounds like — pretty much as I was saying upthread 🙂 — the error isn’t so much in treating the unknown as a “variable,” the error is in thinking you can deduce something about the “variability” when you don’t know the sample space. As I’d put it, we’re in the land of uncertainty, not probability.

  18. skepticistical rootoftisast

    Myself, my ‘ol daddy used to say “a bird in the hand worth two in the bush”. I’d keep the first envelope if it had sufficient money to be useful to me (right now that’s anything over $4.99).

  19. @ 44 (Brendon): No they’re not the same in real life. Real life is quantum mechanical (throughout). For details on the difference between bayesian formulations of probability and quantum mechanical uncertainty, please see Feynman 1982 “simulating physics with computers”.

    As for the discussion here, I actually agree with you. The reason they are disagreeing with you is a matter of perspective (they don’t see what you are saying), which is often the case in such discussions on probability.

  20. @ 45: People will be thinking you are making the silliest comment on the thread so far, but you are actually 100% correct. I would keep what I know for sure that I have. Winning in games of chance is dependent on large, isolated ensembles of the same experiment being repeated. For one chance, I would do the same thing as yourself.

  21. Hmmm …..

    Reviewing the bidding in this thread, we have one group of obviously expert people saying that no matter what you do with the envelopes, you’re implicitly using a probability distribution. (But unless I misread them, this group seems to admit that a distribution is probably useless in a given case.) We have another group which — excluding myself — seems similarly well-informed about the subject, saying that, unless boundary constraints on the size of the prize begin to kick in, it will always be logically wrong to try to assign or use an initial probability distribution.

    Is this always the way the discussion of this paradox plays out? Are we hitting on some deep philosophical divide? Or is there some authoritative source that discusses this paradox (or a similar one) and adjudicates between these analyses?

  22. This discussion reminds me of the time the Monty Hall problem was popular. In talking to members of the math dept. at my college at the time (I was actually a part of that dept. but thru a historical accident as I’m no mathematician), not a single person could correctly solve the problem (and it’s quite a good liberal arts college and a well respected department). After
    reading the solution, EVERY SINGLE MEMBER declared that he had known that all along and that the solution was, of course, trivial.

  23. Sorry to burst the Bayesian priori-zing bubble, but…this puzzle only demonstrates the shortcomings in our commonplace interpretation of the expected value. For a single round of the envelope game, using statistical parameters to justify a strategy is ill-conceived. The realization of the expected profit only comes after many successive trials. You can only use arguments on being better off on average when you have the opportunity to repeatedly play, that is the opportunity the actually realize the average. This game is the dual to the martingale doubling-up strategy, essential reversing the roles of the house and the gambler.

Comments are closed.

Scroll to Top