The Bayesian Second Law of Thermodynamics

Entropy increases. Closed systems become increasingly disordered over time. So says the Second Law of Thermodynamics, one of my favorite notions in all of physics.

At least, entropy usually increases. If we define entropy by first defining “macrostates” — collections of individual states of the system that are macroscopically indistinguishable from each other — and then taking the logarithm of the number of microstates per macrostate, as portrayed in this blog’s header image, then we don’t expect entropy to always increase. According to Boltzmann, the increase of entropy is just really, really probable, since higher-entropy macrostates are much, much bigger than lower-entropy ones. But if we wait long enough — really long, much longer than the age of the universe — a macroscopic system will spontaneously fluctuate into a lower-entropy state. Cream and coffee will unmix, eggs will unbreak, maybe whole universes will come into being. But because the timescales are so long, this is just a matter of intellectual curiosity, not experimental science.

That’s what I was taught, anyway. But since I left grad school, physicists (and chemists, and biologists) have become increasingly interested in ultra-tiny systems, with only a few moving parts. Nanomachines, or the molecular components inside living cells. In systems like that, the occasional downward fluctuation in entropy is not only possible, it’s going to happen relatively frequently — with crucial consequences for how the real world works.

Accordingly, the last fifteen years or so has seen something of a revolution in non-equilibrium statistical mechanics — the study of statistical systems far from their happy resting states. Two of the most important results are the Crooks Fluctuation Theorem (by Gavin Crooks), which relates the probability of a process forward in time to the probability of its time-reverse, and the Jarzynski Equality (by Christopher Jarzynski), which relates the change in free energy between two states to the average amount of work done on a journey between them. (Professional statistical mechanics are so used to dealing with inequalities that when they finally do have an honest equation, they call it an “equality.”) There is a sense in which these relations underlie the good old Second Law; the Jarzynski equality can be derived from the Crooks Fluctuation Theorem, and the Second Law can be derived from the Jarzynski Equality. (Though the three relations were discovered in reverse chronological order from how they are used to derive each other.)

Still, there is a mystery lurking in how we think about entropy and the Second Law — a puzzle that, like many such puzzles, I never really thought about until we came up with a solution. Boltzmann’s definition of entropy (logarithm of number of microstates in a macrostate) is very conceptually clear, and good enough to be engraved on his tombstone. But it’s not the only definition of entropy, and it’s not even the one that people use most often.

Rather than referring to macrostates, we can think of entropy as characterizing something more subjective: our knowledge of the state of the system. That is, we might not know the exact position x and momentum p of every atom that makes up a fluid, but we might have some probability distribution ρ(x,p) that tells us the likelihood the system is in any particular state (to the best of our knowledge). Then the entropy associated with that distribution is given by a different, though equally famous, formula:

S = - \int \rho \log \rho.

That is, we take the probability distribution ρ, multiply it by its own logarithm, and integrate the result over all the possible states of the system, to get (minus) the entropy. A formula like this was introduced by Boltzmann himself, but these days is often associated with Josiah Willard Gibbs, unless you are into information theory, where it’s credited to Claude Shannon. Don’t worry if the symbols are totally opaque; the point is that low entropy means we know a lot about the specific state a system is in, and high entropy means we don’t know much at all.

In appropriate circumstances, the Boltzmann and Gibbs formulations of entropy and the Second Law are closely related to each other. But there’s a crucial difference: in a perfectly isolated system, the Boltzmann entropy tends to increase, but the Gibbs entropy stays exactly constant. In an open system — allowed to interact with the environment — the Gibbs entropy will go up, but it will only go up. It will never fluctuate down. (Entropy can decrease through heat loss, if you put your system in a refrigerator or something, but you know what I mean.) The Gibbs entropy is about our knowledge of the system, and as the system is randomly buffeted by its environment we know less and less about its specific state. So what, from the Gibbs point of view, can we possibly mean by “entropy rarely, but occasionally, will fluctuate downward”?

I won’t hold you in suspense. Since the Gibbs/Shannon entropy is a feature of our knowledge of the system, the way it can fluctuate downward is for us to look at the system and notice that it is in a relatively unlikely state — thereby gaining knowledge.

But this operation of “looking at the system” doesn’t have a ready implementation in how we usually formulate statistical mechanics. Until now! My collaborators Tony Bartolotta, Stefan Leichenauer, Jason Pollack, and I have written a paper formulating statistical mechanics with explicit knowledge updating via measurement outcomes. (Some extra figures, animations, and codes are available at this web page.)

The Bayesian Second Law of Thermodynamics
Anthony Bartolotta, Sean M. Carroll, Stefan Leichenauer, and Jason Pollack

We derive a generalization of the Second Law of Thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution. By allowing an experimenter’s knowledge to be updated by the measurement process, this formulation resolves a tension between the fact that the entropy of a statistical system can sometimes fluctuate downward and the information-theoretic idea that knowledge of a stochastically-evolving system degrades over time. The Bayesian Second Law can be written as ΔH(ρm,ρ)+⟨Q⟩F|m≥0, where ΔH(ρm,ρ) is the change in the cross entropy between the original phase-space probability distribution ρ and the measurement-updated distribution ρm, and ⟨Q⟩F|m is the expectation value of a generalized heat flow out of the system. We also derive refined versions of the Second Law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of the Jarzynski equality. We demonstrate the formalism using simple analytical and numerical examples.

The crucial word “Bayesian” here refers to Bayes’s Theorem, a central result in probability theory. Bayes’s theorem tells us how to update the probability we assign to any given idea, after we’ve received relevant new information. In the case of statistical mechanics, we start with some probability distribution for the system, then let it evolve (by being influenced by the outside world, or simply by interacting with a heat bath). Then we make some measurement — but a realistic measurement, which tells us something about the system but not everything. So we can use Bayes’s Theorem to update our knowledge and get a new probability distribution.

So far, all perfectly standard. We go a bit farther by also updating the initial distribution that we started with — our knowledge of the measurement outcome influences what we think we know about the system at the beginning of the experiment. Then we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times.

That relationship makes use of the cross entropy between two distributions, which you actually don’t see that often in information theory. Think of how much you would expect to learn by being told the specific state of a system, when all you originally knew was some probability distribution. If that distribution were sharply peaked around some value, you don’t expect to learn very much — you basically already know what state the system is in. But if it’s spread out, you expect to learn a bit more. Indeed, we can think of the Gibbs/Shannon entropy S(ρ) as “the average amount we expect to learn by being told the exact state of the system, given that it is described by a probability distribution ρ.”

By contrast, the cross-entropy H(ρ, ω) is a function of two distributions: the “assumed” distribution ω, and the “true” distribution ρ. Now we’re imagining that there are two sources of uncertainty: one because the actual distribution has a nonzero entropy, and another because we’re not even using the right distribution! The cross entropy between those two distributions is “the average amount we expect to learn by being told the exact state of the system, given that we think it is described by a probability distribution ω but it is actually described by a probability distribution ρ.” And the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat). So the BSL gives us a nice information-theoretic way of incorporating the act of “looking at the system” into the formalism of statistical mechanics.

I’m very happy with how this paper turned out, and as usual my hard-working collaborators deserve most of the credit. Of course, none of us actually does statistical mechanics for a living — we’re all particle/field theorists who have wandered off the reservation. What inspired our wandering was actually this article by Natalie Wolchover in Quanta magazine, about work by Jeremy England at MIT. I had read the Quanta article, and Stefan had seen a discussion of it on Reddit, so we got to talking about it at lunch. We thought there was more we could do along these lines, and here we are.

It will be interesting to see what we can do with the BSL, now that we have it. As mentioned, occasional fluctuations downward in entropy happen all the time in small systems, and are especially important in biophysics, perhaps even for the origin of life. While we have phrased the BSL in terms of a measurement carried out by an observer, it’s certainly not necessary to have an actual person there doing the observing. All of our equations hold perfectly well if we simply ask “what happens, given that the system ends up in a certain kind of probability distribution.” That final conditioning might be “a bacteria has replicated,” or “an RNA molecule has assembled itself.” It’s an exciting connection between fundamental principles of physics and the messy reality of our fluctuating world.

This entry was posted in arxiv, Science, Time. Bookmark the permalink.

49 Responses to The Bayesian Second Law of Thermodynamics

  1. Ben Goren says:

    That’s what I was taught, anyway. But since I left grad school, physicists (and chemists, and biologists) have become increasingly interested in ultra-tiny systems, with only a few moving parts. Nanomachines, or the molecular components inside living cells. In systems like that, the occasional downward fluctuation in entropy is not only possible, it’s going to happen relatively frequently — with crucial consequences for how the real world works.

    Sean, would it be fair to suggest that that’s another way of stating that cellular life and, subsequently, Evolution are to-be-expected inevitable results of thermodynamics in Earth-like systems, just as (from other branches of physics) you’d expect things to fall down or oil to float on water or polished metal to be sniny?

    b&

  2. Bill Jefferys says:

    Interesting and cute idea. Thanks for the link to the paper, which I will read with interest.

  3. Sean Carroll says:

    Ben– I’m not nearly confident enough in my (or anybody’s) understanding of the origin of life to use words like “inevitable.” We still don’t understand what actually happened, so it would be premature to claim that whatever happened was inevitable. (Downward fluctuations in entropy are inevitable, but I wouldn’t claim that the origin of life is.)

  4. Ben Goren says:

    Hmm…so, if I’m understanding your qualification right, though there will inevitably be downward fluctuations of entropy, you don’t know if the particular downward fluctuation of entropy that is life is required to be in the set of the downward fluctuations of entropy that do happen.

    Is that a fair restatement?

    If so…would you at least be comfortable with the statement that abiogenesis is in the same general class as the expected downward fluctuations of entropy? Or, you’d expect things to fall down, but you wouldn’t have any way of expecting that one of the things that would fall down would be a particular apple on Newton’s head?

    b&

  5. Bill Jefferys says:

    Ben, and Sean,

    Ilya Prigogine was a colleague at UT (one semester per year) and I once discussed the origin of life problem with him. His most important comment was that as long as the universe was far from equilibrium, as it is since we have hot stars and a 2.7K sink, this will drive phenomena such as evolution.

    So I don’t think that you have to rely on slight decreases in entropy to explain the origin of life. The process of sending energy from a high temperature source through planets and then dumping it to a low-temperature heat sink is sufficient to explain this phenomenon, at least from the point of view of thermodynamics.

    It won’t explain the exact mechanism by which life arose, but it does explain why there is no thermodynamic barrier to this happening.

  6. Ben Goren says:

    Thanks, Bill. Any chance you know of any work (preferably accessible to the general public) that goes into any further detail?

    b&

  7. Sean Carroll says:

    Ben and Bill– We just don’t know. It’s okay not to know! It would be wrong to say that abiogenesis is in the same general class as downward fluctuations of entropy, and it’s equally wrong to say that abiogenesis doesn’t even involve downward fluctuations in entropy. I can imagine either one of those being true — but we don’t know.

    More specifically, Bill, I think Prigogine’s comment is just wrong. Plenty of non-equilibrium processes go forward without life, or even complex structures, ever forming. On the other hand, sometimes they do form. This is a good field in which to remain open-minded until we understand things better.

  8. Daniel says:

    Actually, cross entropy H(p,q) is just relative entropy of q from p, plus the entropy of p; and relative entropy (a.k.a. KL divergence) shows up all over the place in information theory.

  9. Bill Jefferys says:

    Hi Ben, I had this discussion casually as Ilya and I were walking across campus, and I don’t have a ready source that specifically addresses this issue.

    I can only suggest looking at the talk.origins website, which is pretty well organized, you may find something there.

    http://www.talkorigins.org

    Bill

  10. Bill Jefferys says:

    Sean, I don’t think Ilya was wrong, I think that he was just pointing out that in a far-from-equilibrium situation such as we are in and as the universe has been in for a long time, lots of stuff can happen that can’t happen near equilibrium. He wasn’t arguing anything like that life was inevitable or evolution can’t happen or anything like that. Clouds can happen. Weather can happen. Earthquakes can happen. Icebergs can happen. None of these entail that life will happen, but they happen (and Ilya’s point was that life could happen without violating physical law) because of our far-from-equilibrium situation.

  11. Sean Carroll says:

    Bill — Oh, sure, in that case I totally agree. Departures from equilibrium are necessary, but not sufficient, conditions for the existence of something like life. And certainly no violations of physical laws are required. My point was that we don’t know whether downward fluctuations in entropy (which are perfectly compatible with the laws of physics) play a central role in the process or not.

  12. Ben Goren says:

    Sean, I think part of what I’m trying to do is put things in the context of scale. Again, your example was, “Nanomachines, or the molecular components inside living cells. In systems like that, the occasional downward fluctuation in entropy is not only possible, it’s going to happen relatively frequently — with crucial consequences for how the real world works.”

    As I understand it, though we’re missing a great many details, there’s lots of confidence in the general outline that something developed into RNA in a favorable location and that Darwin largely took over at that point — assuming the RNA world hypothesis, but even those who don’t sign off on that have similar theories of biochemical origins.

    RNA is at the physical scale at which these fluctuations you’re describing happen, right?

    I’m sure an entire RNA molecule spontaneously assembling from free atomic components would be too far beyond the fluctuations in question…but what I think I’m missing is the sorts of fluctuations you are describing. Certainly more than garden-variety chemistry or quantum uncertainty; certainly less than Zaphod Beeblebrox spontaneously manifesting with an extra-hot cup of tea in his hands.

    Can you fill in that gap with an example of the typical sort of occurrence you’re describing?

    Thanks,

    b&

  13. Bill Jefferys says:

    Sean, sure, no argument there.

  14. Brad Jackson says:

    OK, so taking in your conversation and applying it in an experiment to calculate the probability of lower entropy in a given macro state, it would depend on how you define and the parameters of the initial state. So, given that I cleaned my two year old granddaughters bedroom. I took pictures and then, after her nap put her in the room and shut the door. I told her to play and I would bee back in a few minutes with her lunch. Now, based on previous observations of her play patterns, what toys are in the room, where they are placed in relation to her observed interests and prior play patterns I calculated what the room should look like in 5 minute intervals up to a total of 15 minutes. Using the law of conservation of energy and thermal dynamics I diagramed the room based using probabilistic statistics and opened the door to compare. Considering that the causal effect for a higher rate of entropy was my granddaughter I thought I had a pretty good idea what to expect. I was wrong. Wrong, wrong, wrong. when factoring in a two year olds need to play was one thing, When you factor in the need to play and being left alone and told to play in a nice neat room, well entropy takes on a new meaning.

  15. David Rutten says:

    Probably somewhat tangential to this, but in case you hadn’t seen it yet this is the only sketch I know based on entropy and time.

    (in case the link doesn’t work: https://www.youtube.com/watch?v=a0N9g9u1T98 )

  16. Torbjörn Larsson says:

    I am usually educated and/or amused by Sean’s texts, but this one has me bemused both from what I have been taught in physics and astrobiology.

    Modern cells are tightly constrained systems. In fact their biochemistry has evolved to cope and take advantage of the high concentration of non-water molecules in the cytosol. And while I am not aware that it has been observed in cells yet, such systems do not seem to obey this generic claim:

    Closed systems become increasingly disordered over time.

    Feynman’s roomy bottom [sic!] has the luxury that some such systems are constrained to increase order as entropy goes up. Or at least, that is what experiments seem to find.

    “But crowded tightly, the particles began forming crystal structures like atoms do — even though they couldn’t make bonds. These ordered crystals had to be the high-entropy arrangements, too.

    Glotzer explains that this isn’t really disorder creating order — entropy needs its image updated. Instead, she describes it as a measure of possibilities. … In this case, ordered arrangements produce the most possibilities, the most options. It’s counterintuitive, to be sure,” Glotzer said.” [ http://www.sciencedaily.com/releases/2012/07/120726142200.htm ]

    Nanomachines, or the molecular components inside living cells. In systems like that, the occasional downward fluctuation in entropy is not only possible, it’s going to happen relatively frequently — with crucial consequences … occasional fluctuations downward in entropy happen all the time in small systems, and are especially important in biophysics, perhaps even for the origin of life. … That final conditioning might be “a bacteria has replicated,” or “an RNA molecule has assembled itself.”

    It is true that cellular machinery often use Brownian ratchets, say for translating and transription, as opposed to generic equilibrium or disequilibrium processes, say in anabolism and catabolism. But same as it is unlikely the first cells had evolved tight packing, it is unlikely they had evolved complex ratchets.

    In fact, one of the two dominant theories for life emergence – I think the dominant after this year’s Astrobiology conference where it passed into normal science by having a session devoted to testing – relies on generic disequilibrium processes. (The fuel cell theory for life.) And it has already shown it has a build in RNA strand replication reactor with the necessary increasing strand lengths – the first known such system. [Here is the old DNA toy system, the RNA variant was presented @ Astrobiology 2015 but isn’t published yet I think: http://www.biosystems.physik.uni-muenchen.de/paperpdfs/mast_reptrap.pdf ; the strand lengthening result was published last year.]

    The fuel cell theory also plays nice with England’s thermodynamics of replicators, where RNA is the only (as of yet) known polymer that fulfill the thermodynamic constraint for self replication.

    So while it can’t be excluded – as noted in the comments – that Brownian ratcheting could have been a mechanism in emergence, it is unlikely to have been a major one.

    The last year has seen astrobiology first claim that life is inevitable on early habitable planets and best compliant with our own universal phylogeny [e.g. Russell et al “The Drive to Life on Wet and Icy Worlds”; lots of homologies], then having the 5 remaining roadblocks pulverized [e.g. strand replication with lengthening, but also non-enzymatic metabolic pathways, replication among pools of racemic nucleotides, et cetera]. And as noted, it looks like 2015 saw it translated to normal science with a testable, and well tested, theory.

    As Sean once said on the now robust basis of physics, I wish these facts were more generally known.

    Oh, and: Hi Ben! Not satisfied with giving theists hell, I see. You can’t stop curiosity, it has nine lives…

  17. Robert cattle says:

    I always like our RI lectures and clear words.
    this page however long its is, can probably start:-

    There is a probability that……… (and end shortly afterwards?) then go into that fine page of reasoning and hyperlinks?

    don’t you think?

  18. Robert DeWolf says:

    I like the way that the life/entropy discussion (particularly with Torbjörn Larsson) has moved to the testable, though the work seems to be at the “complex end”, that is with RNA, metabolic pathways, actual chemicals in solutions etc.. Does anyone know if there are experiments designed which look at the simple or “bottom” end of evolution of complexity, where entropy, microstates and the knowledge of the system can be measured – or indeed, not measured?

    We know or assume that life is essentially of the nature of an engine which operates in a non-equilibrium environment, using energy sources (e.g. geothermal vents) or 6K sun surface temperature sunlight, doing “stuff” (crucially, including replicating), and releasing degraded energy. Could clever experiments or perhaps computer simulations be contrived which have 1) a modelling of the use of energy to drive configuration (“buildup of structure” in http://www.biosystems.physik.uni-muenchen.de/paperpdfs/mast_reptrap.pdf) and 2) a finite but initially small possibility of configurations that can replicate. Clearly this is not something easy to design. It would have to be simple enough to map statistical mechanics quantities to the experiment, but complex or contrived enough to achieve 1) and 2).

    As a thought experiment, it seems to me that the probability/time to get to a state that can replicate might be calculable, but it is not clear to me what happens to the math when that state actually does replicate. (Perhaps we just need to deploy some calculus?) The connection point with this discussion would be “Does the emergence of life have anything to do with downward fluctuations in entropy?”. Any my initial response is: Only if the formation of that replicating configuration is counted as a fluctuation. It seems to me that that is not such an interesting aspect. The BIG point is that the capability to replicate of that configuration would add something very new to the state distribution.

    Come to think of if have the variants of the game of Life ever been studied from a statistical mechanical point of view?

  19. Paul Lurquin says:

    Eric Chaisson has a lot to say about free energy flow (and far from equilibrium conditions) and appearance of structure (including life).

  20. Antonio (AKA "Un físico") says:

    I have seen books on biological systems to deal with the entropy production (rather than with the entropy). May be your bayesian approach needs to focus into this entropy production parameter.
    In order to extend BSL to quantum systems: the off-diagonal terms are the key. I don’t know how you could figure out this issue.

  21. Robert Kern says:

    I look forward to reading the paper in detail soon. I just wanted to make a couple of quick comments about the plots. You may want to try fixing the dynamic range of the colormaps so that they do not change from timestep to timestep. Otherwise, it’s hard to see some of the changes that are happening.

    You may also want to use a non-spectral colormap. The viridis colormap, which is now the default in newer versions of matplotlib, is very good for this purpose. My colorblind eyes will thank you very much. I’d be happy to submit a patch for these when I can get around to it.

  22. Vinod Sehgal says:

    Sean Carroll

    In the main body of the article, you have stated that in ultra tiny systems having few parts like in molecular systems in living cells, downward fluctuations in entropy is not only possible but it is happening quite frequently. while responding in comment part, you have stated
    “Ben– I’m not nearly confident enough in my (or anybody’s) understanding of the origin of life to use words like “inevitable.” We still don’t understand what actually happened, so it would be premature to claim that whatever happened was inevitable. (Downward fluctuations in entropy are inevitable, but I wouldn’t claim that the origin of life is.)”

    Here you have stated that downward fluctuations in entropy are INVETABLE.

    Does it means, for such systems, arrow of time in backward direction is also INEVITABLE.

    Than definition of ulta tiny or large is quite arbitrary one From our stand point of view, molecular system may be tiny one. But from galactic perspective a solar or planetary system may also be tiny one. Does it means, for larger systems also, backward flow of time is INEVITABLE

  23. James Goetz says:

    Hmm, How could this help cosmology? In the case of a nonzero Hamiltonian, then there is a past infinite number of Planck times. In the case of the A-theory of time, then a nonzero Hamiltonian is logically impossible because that would imply a past infinite passage of Planck times, but the logical impossibility might not stop everybody from assuming that. In that case, an infinite number of low entropy pockets have already developed and dissipated, which makes a purely naturalistic cosmology look inevitable. Alternatively, a past infinite number of Planck times in the case of B-theory might by logically plausible with the caveat that all appearance of tensed cause and effect is an illusion. At least the caveat of the illusion for all tense helps to make the observable universe look inevitable, but I also find this dissatisfying. In my case, I hold to a zero Hamiltonian. Peace, Jim

  24. Tom Brown says:

    Sean, I’m neither a scientist nor an economist, but I like reading about both subjects on blogs so I stay in a state of continual confusion.

    How does that relate to your subject here? I’m not sure it does, but the phrase “non-equilibrium statistical mechanics” and your mention of Claude Shannon and information theory made me think of one of my favorite macro economic blogs, written by a physicist who works as an engineer. He’s developed a framework, and it’s based on a concept called the “information transfer model.” This is the physics paper he points to as inspiration:

    If someone were to put a gun to your head and force you to look at macro economics (presumably in as scientific a way as possible), what do you think you might do? What would be step 1? Tell them to go ahead and pull the trigger? Lol.

  25. James Goetz says:

    Dang, a typo in one sentence: “Alternatively, a past infinite number of Planck times in the case of B-theory might [be] logically plausible with the caveat that all appearance of tensed cause and effect is an illusion.”