Quantum Mechanics and Decision Theory

Several different things (all pleasant and work-related, no disasters) have been keeping me from being a good blogger as of late. Last week, for example, we hosted a visit by Andy Albrecht from UC Davis. Andy is one of the pioneers of inflation, and these days has been thinking about the foundations of cosmology, which brings you smack up against other foundational issues in fields like statistical mechanics and quantum mechanics. We spent a lot of time talking about the nature of probability in QM, sparked in part by a somewhat-recent paper by our erstwhile guest blogger Don Page.

But that’s not what I want to talk about right now. Rather, our conversations nudged me into investigating some work that I have long known about but never really looked into: David Deutsch’s argument that probability in quantum mechanics doesn’t arise as part of a separate ad hoc assumption, but can be justified using decision theory. (Which led me to this weekend’s provocative quote.) Deutsch’s work (and subsequent refinements by another former guest blogger, David Wallace) is known to everyone who thinks about the foundations of quantum mechanics, but for some reason I had never sat down and read his paper. Now I have, and I think the basic idea is simple enough to put in a blog post — at least, a blog post aimed at people who are already familiar with the basics of quantum mechanics. (I don’t have the energy in me for a true popularization at the moment.) I’m going to try to get to the essence of the argument rather than being completely careful, so please see the original paper for the details.

The origin of probability in QM is obviously a crucial issue, but becomes even more pressing for those of us who are swayed by the Everett or Many-Worlds Interpretation. The MWI holds that we have a Hilbert space, and a wave function, and a rule (Schrödinger’s equation) for how the wave function evolves with time, and that’s it. No extra assumptions about “measurements” are allowed. Your measuring device is a quantum object that is described by the wave function, as are you, and all you ever do is obey the Schrödinger equation. If MWI is to have some chance of being right, we must be able to derive the Born Rule — the statement that the probability of obtaining a certain result from a quantum measurement is the square of the amplitude — from the underlying dynamics, not just postulate it.

Deutsch doesn’t actually spend time talking about decoherence or specific interpretations of QM. He takes for granted that when we have some observable X with some eigenstates |xi>, and we have a system described by a state

|\psi\rangle = a |x_1\rangle + b |x_2\rangle ,

then a measurement of X is going to return either x1 or x2. But we don’t know which, and at this stage of the game we certainly don’t know that the probability of x1 is |a|2 or the probability of x2 is |b|2; that’s what we’d like to prove.

In fact let’s just focus on a simple special case, where

a = b = \frac{1}{\sqrt{2}} .

If we can prove that in this case, the probability of either outcome is 50%, we’ve done the hard part of the work — showing how probabilistic conclusions can arise at all from non-probabilistic assumptions. Then there’s a bit of mathematical lifting one must do to generalize to other possible amplitudes, but that part is conceptually straightforward. Deutsch refers to this crucial step as deriving “tends to from does,” in a mischievous parallel with attempts to derive ought from is. (Except I think in this case one has a chance of succeeding.)

The technique used will be decision theory, which is a way of formalizing how we make rational choices. In decision theory we think of everything we do as a “game,” and playing a game results in a “value” or “payoff” or “utility” — what we expect to gain by playing the game. If we have the choice between two different (mutually exclusive) actions, we always choose the one with higher value; if the values are equal, we are indifferent. We are also indifferent if we are given the choice between playing two games with values V1 and V2 or a single game with value V3 = V1 + V2; that is, games can be broken into sub-games, and the values just add. Note that these properties make “value” something more subtle than “money.” To a non-wealthy person, the value of two million dollars is not equal to twice the value of one million dollars. The first million is more valuable, because the second million has a smaller marginal value than the first — the lifestyle change that it brings about is much less. But in the world of abstract “value points” this is taken into consideration, and our value is strictly linear; the value of an individual dollar will therefore depend on how many dollars we already have.

There are various axioms assumed by decision theory, but for the purposes of this blog post I’ll treat them as largely intuitive. Let’s imagine that the game we’re playing takes the form of a quantum measurement, and we have a quantum operator X whose eigenvalues are equal to the value we obtain by measuring them. That is, the value of an eigenstate |x> of X is given by

V[|x\rangle] = x .

The tricky thing we would like to prove amounts to the statement that the value of a superposition is given by the Born Rule probabilities. That is, for our one simple case of interest, we want to show that

V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] = \frac{1}{2}(x_1 + x_2) . \qquad\qquad(1)

After that it would just be a matter of grinding. If we can prove this result, maximizing our value in the game of quantum mechanics is precisely the same as maximizing our expected value in a probabilistic world governed by the Born Rule.

To get there we need two simple propositions that can be justified within the framework of decision theory. The first is:

Given a game with a certain set of possible payoffs, the value of playing a game with precisely minus that set of payoffs is minus the value of the original game.

Note that payoffs need not be positive! This principle explains what it’s like to play a two-person zero-sum game. Whatever one person wins, the other loses. In that case, the value of the game to the two participants are equal in magnitude and opposite in sign. In our quantum-mechanics language, we have:

V\left[\frac{1}{\sqrt{2}}(|-x_1\rangle + |-x_2\rangle)\right] = - V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] .  \qquad\qquad (2)

Keep that in mind. Here’s the other principle we need:

If we take a game and increase every possible payoff by a fixed amount k, the value is equivalent to playing the original game, then receiving value k.

If I want to change the value of a playing a game by k, it doesn’t matter whether I simply add k to each possible outcome, or just let you play the game and then give you k. I don’t think we can argue with that. In our quantum notation we would have

V\left[\frac{1}{\sqrt{2}}(|x_1+k\rangle + |x_2+k\rangle)\right] =  V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] +k .  \qquad\qquad (3)

Okay, if we buy that, from now on it’s simple algebra. Let’s consider the specific choice

k = -x_1 - x_2

and plug this into (3). We get

V\left[\frac{1}{\sqrt{2}}(|-x_2\rangle + |-x_1\rangle)\right] =  V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] -x_1 - x_2.

You can probably see where this is going (if you’ve managed to make it this far). Use our other rule (2) to make this

-2 V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] = -x_1 - x_2  ,

which simplifies straightaway to

V\left[\frac{1}{\sqrt{2}}(|x_1\rangle + |x_2\rangle)\right] = \frac{1}{2}(x_1 + x_2) ,

which is our sought-after result (1).

Now, notice this result by itself doesn’t contain the word “probability.” It’s simply a fairly formal manipulation, taking advantage of the additivity of values in decision theory and the linearity of quantum mechanics. But Deutsch argues — and on this I think he’s correct — that this result implies we should act as if the Born Rule is true if we are rational decision-makers. We’ve shown that the value of a game described by an equal quantum superposition of states |x1> and |x2> is equal to the value of a game where we have a 50% chance of gaining value x1 and a 50% chance of gaining x2. (In other words, if we acted as if the Born Rule were not true, someone else could make money off us by challenging us to such games, and that would be bad.) As someone who is sympathetic to pragmatism, I think that “we should always act as if A is true” is the same as “A is true.” So the Born Rule emerges from the MWI plus some seemingly-innocent axioms of decision theory.

While I certainly haven’t followed the considerable literature that has grown up around this proposal over the years, I’ll confess that it smells basically right to me. If anyone knows of any strong objections to the idea, I’d love to hear them. But reading about it has added a teensy bit to my confidence that the MWI is on the right track.

This entry was posted in Science. Bookmark the permalink.

63 Responses to Quantum Mechanics and Decision Theory

  1. anon. says:

    This is cute, but one aspect of it is bothering me. Believing in QM and understanding decoherence gets you to the point that Hamiltonian evolution in the presence of an environment gives you states that have some “weight,” measured by the Hilbert space measure, clustered around apparent classical outcomes. The inner product, which measures this “weight,” is an intrinsic part of QM, I think. I see the problem of deriving the Born Rule as being the problem of showing that if you repeat an experiment a number of times, the frequencies approach those corresponding to counting these states by the Hilbert space weight. In other words, the inner product isn’t just a mathematical device that hangs around, it plays a key role in determining observable outcomes. So: where’s the inner product on Hilbert space hiding in the argument you outlined above? It might be hiding in some assumption about how the x states are normalized, but can it be made explicit in a way that shows that this is really addressing the right question?

  2. Sean Carroll says:

    The step from the equation just before “You can probably see where this is going” to the equation just after makes implicit use of the inner product. (Update: oops, not true, see #6 and #7 below.) Note that we switched the order of |x_1> and |x_2> in the sum, which wouldn’t have been possible if they didn’t have equal amplitudes.

  3. MPS17 says:

    Thanks for the post. Zurek has some ideas on this too. Although I haven’t read the paper, I heard the talk and they seemed more in line with ways we physicists like the approach problems.


    UPDATE: I think this links to the original literature. I haven’t thought carefully about this so please excuse if this discusses a differently nuanced issue:


  4. Sean Carroll says:

    I think it is the same kind of issue, and Zurek’s papers are extremely interesting. Instead of talking about decision theory, he talks about symmetries. He claims that, once we allow for the existence of an environment, there is a new symmetry (“envariance”) that applies to states like (1), so that the probabilities of getting x_1 and x_2 must be equal. From there the same reasoning applies.

    There is some critique along the lines of “Zurek shows that if it’s appropriate to think of quantum mechanics in terms of probabilities at all, then those probabilities should obey the Born Rule, but he doesn’t actually demonstrate the need for probabilities.” It’s not clear to me that this couldn’t also be applied to Deutsch’s argument. But this is philosophical terrain, and I think the underlying thrust of Deutsch and Zurek are actually quite similar, although using quite different vocabularies.

  5. will says:

    the 1/sqrt(2) does not seem justified, and as that is the crux of the discussion, this argument does not convince me well.

    You might as well replace 1/sqrt(2) with a variable ‘m’ for example throughout all the equations, and your final conclusion would be just as “correct”.

    With 1/sqrt(2) removed, the whole argument becomes a tautology… interesting no doubt, but proving nothing except that the author is well versed in basic algebra.

  6. Matt Leifer says:

    Sean, that is not using the inner product. It is simply using the vector space structure. You can’t assume that the inner product has any a-priori relevance within this approach because that is what you are trying to derive, i.e. the only reason you pay attention to things like inner products and unitarity within conventional quantum mechanics is because you are trying to avoid negative probabilities, but you have no reason for connecting those two things until you have first derived the Born rule.

    I too like this argument, although I have my own version of it that makes use of Gleason’s theorem which I prefer, since it tells you that you should structure your probability assignments according to traces of operators against some density operator, even if you don’t know what the “wavefunction of the universe” is.

    There are legitimate issues surrounding the interpretation of probability in this approach, i.e. should one also be trying to derive a limiting frequency. Many of these issues are not specific to QM, since people differ on whether this is required even in the classical case. However, whether or not you think frequencies are required, it must be admitted that getting the decision theoretic interpretation right is even more important. After all, if I could derive a relative frequency, but was not able to derive the fact that I should use probabilities to inform my decisions then that would be a complete disaster. What use is it if I can derive that a fair quantum coin should have limiting 50/50 relative frequencies, but not that I should consider a bet on heads at stake $1 that pays $2 to be fair?

    There are also issues surrounding the very meaning of terms like “probability” and “utility” in this approach, since we are assuming that all outcomes actually occur. The two concepts get mushed together into something like a “caring weight” which measures how much we should care about each of our successors at the end of a quantum experiment. If you think about that for a minute it leads to moral issues, e.g. why should I care less about a successor who lives in a branch that happens to have a small amplitude. In the analagous classical case we can say it is because there is a very small chance that such a successor will exist, but quantum mechanically they definitely will exist. Thus, one can question whether it is moral to accept a scenario in which you get a large sum on money on a large amplitude branch, but die a horrible painful death in another branch, even with an amplitude that is epsilon above zero. In light of the Deutsch-Wallace argument, this indicates one of two things, either:

    – The usual intuitions about decision theory break down in a many-worlds scenario.
    – They do not break down, but we would always use extremal utilities, which makes it vacuous.

    By an extremal utility, I mean one that is infinity or -infinity on some outcomes, e.g. dying a painful death. The principle of maximum expected utility is useless in such cases.

    I have a lot more to say on this subject, but not the energy to go into it right now. I do have a paper on the backburner at the moment that deals with these issues.

  7. Sean Carroll says:

    Matt– You’re right, I was being very sloppy. That’s just the vector-space structure. The role of the inner product is essentially what you’re trying to derive, as you say. Thanks for the other comments. As you say, most of the additional issues refer to the nature of probability (or the definition of “value”), not really specifically to quantum mechanics.

    will– The argument certainly isn’t a tautology. Of course you could replace the 1/sqrt{2} by any number, as long as the coefficient of both terms is the same (that’s what was used in the argument just referenced). But that’s what you want! If that number were something else, you would have a non-normalized wave function. But you would still want to have equal probabilities for two branches with equal weights.

  8. Peli Grietzer says:

    This fantastic paper by Adrian Kent has some great arguments about why the ‘but what does speaking about probabilities even mean’ issue for MW is sharply unlike any similar issues that arises for one-world theories: http://arxiv.org/abs/0905.0624

  9. CU Phil says:

    There is quite a bit of criticism of the decision-theoretic proposal (most vociferously from David Albert and Adrian Kent) as well as several papers advocating the approach in this volume:


    The review gives a nice summary of the debate. Also, Bob Wald reviewed the above volume in Classical and Quantum Gravity:


    and also gives an insightful review.

  10. Michael Bacon says:


    I don’t think that Kent’s argument succeeds in proving the failure of the Everett program. However, assuming that his argument does succeed, Kent goes on to say that such Everettarian failure “adds to the likelihood that the fundamental problem is not our inability to interpret quantum theory correctly but rather a limitation of quantum theory itself.” Perhaps, but at least for now, my money remains on quantum theory.

  11. Matt McIrvin says:

    @will: The requirement that state vectors have norm 1 is already a requirement of quantum mechanics separate from any interpretation of amplitudes as probabilities. Given that, the factor of 1/sqrt(2) (up to some arbitrary complex phase) is necessary if the two terms have equal coefficients.

    Once you make any move in the direction of a probabilistic interpretation, the Born rule falls out as the only one that makes mathematical sense; there are many ways of demonstrating this. But that first step is a doozy, and I always have the sneaking suspicion that arguments like this one have somehow smuggled their conclusion in as part of an assumption that only seems less controversial.

  12. Matt McIrvin says:

    …my own favorite handwaving quasi-derivation of the Born rule was a probably-not-original stochastic argument that I thought up on a long walk along the Charles River many years ago.

    Consider the Feynman path integral for a particle that travels from point A to point B. Now suppose that you put a screen between point A and point B that randomly tweaks the particle’s wavefunction phase to a different value at each point (maybe coarse-grain it a little to make the math tractable: divide it into tiny “pixels” that each have a different random phase factor).

    Now consider the amplitude that the particle goes from point A to point B traveling through some coarser-grained but still small bundle of pixels. The amplitudes for each pixel will add like a random walk, yielding an overall amplitude that increases as the square root of the number of pixels. Which is exactly what you’d get by interpreting the square of the amplitude as a probability.

  13. Moshe says:

    I’m puzzled about something really basic: you are trying to argue for an expression that is quadratic in the coefficients a,b of your wavefunction (something that encodes in it interference, the essential mystery of QM). Instead you are deriving an expression which is linear in these coefficients (as pointed out, you have only used the linear structure of the Hilbert space, not the inner product). The derivation seems to use in an essential way the equality of both coefficients a=b, and of course that is precisely the only case where quadratic and linear expressions have the same consequences. But, what happens in the generic case? For example, what happens if a,b only differ by a phase? that should still lead to the same final expression. It seems to me that if you put a=-b and repeat your derivation, you’d find the same minus sign in the RHS of (1), instead of the result predicted by the Born rule.

  14. Jess Riedel says:

    Sean: Like Peli Grietzer, I highly recommend Kent’s criticism of the decision-theory approach. To add to what Peli said, I think Kent conclusively shows that the axioms of decision theory in the many-worlds context are not nearly as obvious as they first appear, to the point that they become much less attractive than approaches which rest on Gleason’s theorem like Matt Leifer suggests.

    Of course, this is all truly philosophy; the game here is to try to reduce the axioms of quantum mechanics to their most beautiful (and, usually, simple) form. Sometimes, this improvement is so dramatic that I think everyone should agree that the new axioms are superior [such as my advisor Zurek’s work–which I am constantly advertising–showing that the mysterious declaration that observables be Hermitian operators can be traced back to the linearity of evolution and the need for amplification (http://arxiv.org/abs/quant-ph/0703160)]. But sometimes, it’s just a matter of taste.

    Also, I’d like to clarify Michael Bacon’s comment. Kent’s paper strongly concentrates on attacking the decision-theoretic basis of Born’s rule, and only addresses the attractiveness of quantum theory in general as an aside. In particular, by the “Everett program”, Kent means that claim that quantum theory need not be supplemented by an ad-hoc assumption for extracting probabilities. I believe Kent is open to the idea that quantum theory need not be modified *if* a sufficiently attractive assumption can be found which allows the extraction of unambiguous probabilities (e.g. if the “set-selection problem” in the consistent histories framework could be solved, which he has written about). But yes, Kent does take the extreme difficulty of finding non-ad-hoc assumption as weak evidence that quantum theory is fundamentally wrong.

  15. Michael Bacon says:


    You obviously are closer to this than I am, and you may well be right that all Kent really thinks is that the extreme difficulty of finding non-ad-hoc assumptions is “weak” evidence that quantum theory is fundamentally wrong. However, that’s not what the language I quoted says. At least here, he’s clearly saying that there is a “likelihood” that quantum theory is wrong — i.e, more likely than not. And, that his work merely adds to that “likelihood”. Nevertheless, perhaps I’m making too much of the particular words he chose to describe his view. By the way, I love the picture of you in your natural environment on your web page. 😉

  16. Sean Carroll says:

    Moshe– I encourage you to put a minus sign in front of the x_2 term and go through the math. 🙂

    Obviously there is work to be done generalizing to other amplitudes, but that’s done in the paper; I don’t think there’s much controversy about that part.

  17. Anonymous Coward says:

    I’d be interested to how you view the relation to classic thermodynamics.

    There, likewise, a probability distribution “falls out of the sky”. There is some justification in things like the Sinai-Boltzmann Conjecture, stating that the standard (Liouville-phase-space-) measure is the only sensible one (uniquely ergodic, for the toy problem of hard-ball billiard)… IF you assume, that the god who has chosen the initial conditions of the world has done so with an absolutely continuous probability distribution (SRB-measure). If you admit “pathological” probability measures, the entire argument collapses unto itself.

    I always viewed, maybe naively, the Born rule as a similiar thing. People conjecture and hope to prove at some point, that the Born rule follows if we make the pretty basic (and mind-boggingly subtle!) assumption that the initial conditions of our universe have been picked compatibly with some infinite-dimensional generalization of Lebegue-measure.

    [sorry for the theistic metaphor… personificating some aspects of nature helps me think more clearly]

  18. Sean Carroll says:

    I think it’s certainly a good question. People like Albrecht and Deutsch believe that the only way to justify any classical probability distribution is ultimately in terms of the Born Rule. I wouldn’t necessarily think it’s a failure if the answer is “that’s the most natural measure there is,” but I’m hopeful that some better picture of the connection between QM and classical stat mech (plus perhaps some initial-conditions input from cosmology) explain why the Liouville measure is the “right” one.

  19. Moshe says:

    I see where I was confused: you are using a linear structure in the space of eigenvalues, not for the coefficients, so the value for a=-b is not determined by the above considerations. I should probably take a look at the paper sometime, sounds mysterious how one can get anything quadratic from what you wrote so far.

  20. Ben says:

    Hi Sean,
    I remember a great lecture by Nima Arkani-Hamed at T.A.S.I. 2007,
    http://physicslearning2.colorado.edu/tasi/hamed_02/SupportingFiles/video/video.wmv ,
    where he points out that the Born Rule can be derived from the operator postulate, i.e.
    that physical measurement outcomes can be identified with the eigenvalues of a corresponding Hermitian operator.

    The argument is as follows: Construct the tensor-product state of N identically prepared
    copies of a|x1> + b|x2>. This could be expanded out using binomial coefficients. There is a Hermitian operator N1 which counts how many copies are in the state |x1>. Then if
    we take N1/N in the limit N to infinity, we obtain a Hermitian operator whose eigenvalue
    is |a|^2, i.e. it is the probability operator.

    So we get the Born Rule for free!

  21. You can’t get fundamental probability out without putting fundamental probability in, the Everett approach is just untenable and even quite ridiculous imho compared to just accepting that fundamental randomness exists – then the Born rule emerges as a kind of thermodynamic property of the Schrödinger evolution – the Bohmian guys have even demonstrated this (based on their wrong ontological model)

    Also, as I keep trying to tell everyone, the past universe does not exist, you have to look at the (discrete) flow of the Schrödinger evolution exp(hL).U(t) – U(t) to describe what we observe, and in this case we get 3D space as period-3 points in the Hilbert Space.

  22. Colin says:

    The math is incorrect on your equation 3 (and also in Deutsch’s original paper). You only add 1/rt2 K to each outcome of the game on the left side of the equation, whereas you add an entire K to the right side of the equation. In reality, where you have 1/rt2(x1+x2) standing in place for the entire system Psi, you can do one of two things to manipulate the equation: value psi as game with only one outcome, and add a single k to each side (trivial)….Or you can keep 1/rt2(x1) and 1/rt2(x2) separate, and add an entire K to each…but still only 1 k on the right.

    EDIT…I noticed that this argument is a little skewed as you are adding K to each eigenstate…so it’s not the simple math; but the premise is still correct…what has been added to each outcome on the left is not what as been added to teh entire game on the right. If I started with V(Psi>) instead of V(1/rt2x1> +1/rt2x2>) (which are identical by assumption), I would add K to get V(psi>+k).

  23. Pingback: Daily Run Down 04/16/2012 | Wayne's Workshop

  24. Pingback: Linkblogging for 16/04/12 « Sci-Ence! Justice Leak!

  25. Alan Cooper says:

    The reference to state vectors of form |x+k> seem to be as eigenvectors for the operator X+k rather than for X, so I am not clear that it makes sense to say |x1-(x1+x2)>=|-x2>
    (In fact, making the operator explicit, we would seem to have
    |x1-(x1+x2) for X-(x1+x2)>=|x1 for X> not |-x2 for X>)

    And in any case the argument seems to be showing that if there was an expectation function with the expected properties then it would have to satisfy the Born rule. But that is not the same as saying that such a function should actually have a probabilistic interpretation. (Actually I guess this is the same complaint as what you alluded to in the second para of your coment#4 but I do think it’s a serious one.)