Probability Theory & Statistical Mechanics: Min. Variance Estimation & Carnot's Principle, Summaries of Physics

Two main topics: minimum variance estimation and Carnot's principle. The first part explains the concept of the best estimate for a variable, which is the estimate with the smallest variance. It derives the formula for the best estimate using the principle of least squares. The second part discusses Carnot's principle, which deals with the efficiency of heat engines. It explains the significance of the theorem and its relation to the concept of entropy.

Typology: Summaries

2021/2022

Uploaded on 09/27/2022

doggy
doggy 🇬🇧

4.1

(25)

228 documents

1 / 292

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ENTROPIC INFERENCE AND THE
FOUNDATIONS OF PHYSICS
ARIEL CATICHA
Department of Physics, University at Albany–SUNY
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Probability Theory & Statistical Mechanics: Min. Variance Estimation & Carnot's Principle and more Summaries Physics in PDF only on Docsity!

ENTROPIC INFERENCE AND THE

FOUNDATIONS OF PHYSICS

ARIEL CATICHA

Department of Physics, University at Albany–SUNY

iv Foreword

Contents

  • Preface Foreword iii
  • 1 Inductive Inference and Physics
    • 1.1 Probability
    • 1.2 Designing a framework for inductive inference
    • 1.3 Entropic Physics
  • 2 Probability
    • 2.1 The design of probability theory
      • 2.1.1 Rational beliefs?
      • 2.1.2 Quantifying rational belief
    • 2.2 The sum rule
      • 2.2.1 The associativity constraint
      • 2.2.2 The general solution and its regraduation
      • 2.2.3 The general sum rule
      • 2.2.4 Cox’s proof
    • 2.3 The product rule
      • 2.3.1 From four arguments down to two
      • 2.3.2 The distributivity constraint
    • 2.4 Some remarks on the sum and product rules
      • 2.4.1 On meaning, ignorance and randomness
      • 2.4.2 Independent and mutually exclusive events
      • 2.4.3 Marginalization
    • 2.5 The expected value
    • 2.6 The binomial distribution
    • 2.7 Probability vs. frequency: the law of large numbers
    • 2.8 The Gaussian distribution
      • 2.8.1 The de Moivre-Laplace theorem
      • 2.8.2 The Central Limit Theorem
    • 2.9 Updating probabilities: Bayes’ rule
      • 2.9.1 Formulating the problem
      • 2.9.2 Minimal updating: Bayes’ rule
      • 2.9.3 Multiple experiments, sequential updating vi CONTENTS
      • 2.9.4 Remarks on priors
    • 2.10 Hypothesis testing and confirmation
    • 2.11 Examples from data analysis
      • 2.11.1 Parameter estimation
      • 2.11.2 Curve fitting
      • 2.11.3 Model selection
      • 2.11.4 Maximum Likelihood
  • 3 Entropy I: The Evolution of Carnot’s Principle
    • 3.1 Carnot: reversible engines
    • 3.2 Kelvin: temperature
    • 3.3 Clausius: entropy
    • 3.4 Maxwell: probability
    • 3.5 Gibbs: beyond heat
    • 3.6 Boltzmann: entropy and probability
    • 3.7 Some remarks
  • 4 Entropy II: Measuring Information
    • 4.1 Shannon’s information measure
    • 4.2 Relative entropy
    • 4.3 Joint entropy, additivity, and subadditivity
    • 4.4 Conditional entropy and mutual information
    • 4.5 Continuous distributions
    • 4.6 Experimental design
    • 4.7 Communication Theory
    • 4.8 Assigning probabilities: MaxEnt
    • 4.9 Canonical distributions
    • 4.10 On constraints and relevant information
    • 4.11 Avoiding pitfalls – I
      • 4.11.1 MaxEnt cannot fix flawed information
      • 4.11.2 MaxEnt cannot supply missing information
      • 4.11.3 Sample averages are not expected values
  • 5 Statistical Mechanics
    • 5.1 Liouville’s theorem
    • 5.2 Derivation of Equal a Priori Probabilities
    • 5.3 The relevant constraints
    • 5.4 The canonical formalism
    • 5.5 Equilibrium with a heat bath of finite size
    • 5.6 The Second Law of Thermodynamics
    • 5.7 The thermodynamic limit
    • 5.8 Interpretation of the Second Law: Reproducibility
    • 5.9 Remarks on irreversibility
    • 5.10 Entropies, descriptions and the Gibbs paradox
  • 6 Entropy III: Updating Probabilities CONTENTS vii
    • 6.1 What is information?
    • 6.2 The design of entropic inference
      • 6.2.1 General criteria
      • 6.2.2 Entropy as a tool for updating probabilities
      • 6.2.3 Specific design criteria
      • 6.2.4 The ME method
    • 6.3 The proofs
    • 6.4 An alternative independence criterion: consistency
    • 6.5 Random remarks
      • 6.5.1 On priors
      • 6.5.2 Comments on other axiomatizations
    • 6.6 Bayes’ rule as a special case of ME
    • 6.7 Commuting and non-commuting constraints
    • 6.8 Conclusion
  • 7 Information Geometry
    • 7.1 Examples of statistical manifolds
    • 7.2 Vectors in curved spaces
    • 7.3 Distance and volume in curved spaces
    • 7.4 Derivations of the information metric
      • 7.4.1 Derivation from distinguishability
      • 7.4.2 Derivation from a Euclidean metric
      • 7.4.3 Derivation from asymptotic inference
      • 7.4.4 Derivation from relative entropy
    • 7.5 Uniqueness of the information metric
    • 7.6 The metric for some common distributions
  • 8 Entropy IV: Entropic Inference
    • 8.1 Deviations from maximum entropy
    • 8.2 The ME method
    • 8.3 An application to fluctuations
    • 8.4 Avoiding pitfalls – II
      • 8.4.1 The three-sided die
      • 8.4.2 Understanding ignorance
  • 9 Entropic Dynamics: Time and Quantum Theory
    • 9.1 The statistical model
    • 9.2 Entropic dynamics
    • 9.3 Entropic time
      • 9.3.1 Time as a sequence of instants
    • 9.4 Duration: a convenient time scale
      • 9.4.1 The directionality of entropic time
    • 9.5 Accumulating changes
      • 9.5.1 Derivation of the Fokker-Planck equation
      • 9.5.2 The current and osmotic velocities
    • 9.6 Non-dissipative diffusion viii CONTENTS
      • 9.6.1 Manifold dynamics
      • 9.6.2 Classical limits
      • 9.6.3 The Schr¨odinger equation
    • 9.7 A quantum equivalence principle
    • 9.8 Entropic time vs. physical time
    • 9.9 Dynamics in an external electromagnetic field
      • 9.9.1 An additional constraint
      • 9.9.2 Entropic dynamics
      • 9.9.3 Gauge invariance
    • 9.10 Is ED a hidden-variable model?
    • 9.11 Summary and Conclusions
  • 10 Topics in Quantum Theory
    • 10.1 The quantum measurement problem
    • 10.2 Observables other than position
    • 10.3 Amplification
    • 10.4 But isn’t the measuring device a quantum system too?
    • 10.5 Momentum in Entropic Dynamics
      • 10.5.1 Expected values
      • 10.5.2 Uncertainty relations
      • 10.5.3 Discussion
      • 10.5.4 An aside: the hybrid μ = 0 theory
    • 10.6 Conclusions
  • References

x Preface

N. Caticha, whose views on these matters have profoundly influenced my own, but I have also learned much from discussions with many colleagues and friends: D. Bartolomeo, C. Cafaro, V. Dose, K. Earle, R. Fischer, A. Garrett, A. Giffin, P. Goggans, A. Golan, M. I. Gomez, P. Goyal, M. Grendar, D. T. Johnson, K. Knuth, S. Nawaz, R. Preuss, T. Seidenfeld, J. Skilling, R. Spekkens, and C.-Y. Tseng. I would also like to thank all the students who over the years have taken my course on Information Physics; their questions and doubts have very often helped clear my own questions and doubts. I would also like to express my special gratitude to Julio Stern for his continued encouragement to get my lectures published and to J. Stern, C. A. de Bragan¸ca Pereira, A. Polpo, M. Lauretto and M. A. Diniz, organizers of EBEB 2012 for undertaking their publication.

Albany, February 2012.

Chapter 1

Inductive Inference and

Physics

The process of drawing conclusions from available information is called infer- ence. When the available information is sufficient to make unequivocal, unique assessments of truth we speak of making deductions: on the basis of a certain piece of information we deduce that a certain proposition is true. The method of reasoning leading to deductive inferences is called logic. Situations where the available information is insufficient to reach such certainty lie outside the realm of logic. In these cases we speak of doing inductive inference, and the methods deployed are those of probability theory and entropic inference.

1.1 Probability

The question of the meaning and interpretation of the concept of probability has long been controversial. Needless to say the interpretations offered by various schools are at least partially successful or else they would already have been discarded. But the different interpretations are not equivalent. They lead people to ask different questions and to pursue their research in different directions. Some questions may become essential and urgent under one interpretation while totally irrelevant under another. And perhaps even more important: under different interpretations equations can be used differently and this can lead to different predictions.

The frequency interpretation

Historically the frequentist interpretation has been the most popular: the prob- ability of a random event is given by the relative number of occurrences of the event in a sufficiently large number of identical and independent trials. The appeal of this interpretation is that it seems to provide an empirical method to estimate probabilities by counting over the ensemble of trials. The magnitude

1.1 Probability 3

fact that is described by referring to Bayesian probabilities as being subjective. This term is somewhat misleading because there are (at least) two views on this matter, one is the so-called subjective Bayesian or personalistic view (see, e.g., [Savage 1972; Howson Urbach 1993; Jeffrey 2004]), and the other is the objective Bayesian view (see e.g. [Jeffreys 1939; Cox, 1946; Jaynes 1985, 2003; Lucas 1970]). For an excellent elementary introduction with a philosophical perspective see [Hacking 2001]. According to the subjective view, two reason- able individuals faced with the same evidence, the same information, can legiti- mately differ in their confidence in the truth of a proposition and may therefore assign different probabilities. Subjective Bayesians accept that an individual can change his or her beliefs, merely on the basis of introspection, reasoning, or even revelation. At the other end of the Bayesian spectrum, the objective Bayesian view considers the theory of probability as an extension of logic. It is said then that a probability measures a degree of rational belief. It is assumed that the objective Bayesian has thought so long and hard about how probabilities are assigned that no further reasoning will induce a revision of beliefs except when confronted with new information. In an ideal situation two different individuals will, on the basis of the same information, assign the same probabilities.

Subjective or objective?

Whether Bayesian probabilities are subjective or objective is still a matter of dispute. Our position is that they lie somewhere in between. Probabilities will always retain a “subjective” element because translating information into prob- abilities involves judgments and different people will inevitably judge differently. On the other hand, it is a presupposition of thought itself that some beliefs are better than others — otherwise why go through the trouble of thinking? And they are “objectively” better in that they provide better guidance about how to cope with the world. The adoption of better beliefs has real consequences. Similarly, not all probability assignments are equally useful and it is plausible that what makes some assignments better than others is that they represent or reflect some objective feature of the world. One might even say that what makes them better is that they provide a better guide to the “truth”. It is the conviction that posterior probabilities are somehow objectively better than prior probabilities that provides the justification for going through the troubles of gathering information and using it to update our beliefs. We shall find that while the subjective element in probabilities can never be completely eliminated, the rules for processing information, that is, the rules for updating probabilities, are themselves quite objective. This means that the new information can be objectively processed and incorporated into our posterior probabilities. Thus, it is quite possible to continuously suppress the subjective elements while enhancing the objective elements as we process more and more information. Thus, probabilities can be characterized by both subjective and objective elements and, ultimately, it is their objectivity that makes probabilities use-

4 Inductive Inference and Physics

ful. There is much to be gained by rejecting the sharp subjective/objective dichotomy and replacing it with a continuous spectrum of intermediate possi- bilities.^1

1.2 Designing a framework for inductive infer-

ence

A common hope in both science and philosophy has been to find a secure foun- dation for knowledge on which to build science, mathematics, and philosophy. So far the search has not been successful and everything indicates that such in- dubitable foundation is nowhere to be found. Accordingly, we adopt a pragmatic attitude: there are ideas about which we can have greater or lesser confidence, and from these we can infer the plausibility of others; but there is nothing about which we can have full certainty and complete knowledge. Inductive inference in its Bayesian/entropic form is a framework designed for the purpose of coping with the world in a rational way in situations where the information available is incomplete. The framework must solve two related problems. First, it must allow for convenient representations of states of partial knowledge — this is handled through the introduction of probabilities. Second, it must allow us to update from one state of knowledge to another when new information becomes available — this is handled through the introduction of relative entropy as the tool for updating. The theory of probability cannot be separate from a theory for updating probabilities. The framework for inference will be constructed by a process of eliminative induction. The objective is to design the appropriate tools, which in our case, means designing the theory of probability and entropy. The different ways in which probabilities and entropies are defined and handled will lead to different inference schemes and one can imagine a vast variety of possibilities. To select one we must first have a clear idea of the function that those tools are supposed to perform, that is, we must specify design criteria or design specifications that the desired inference framework must obey. Finally, in the eliminative part of the process one proceeds to systematically rule out all those inference schemes that fail to comply with the design criteria — that is, that fail to perform as desired. There is no implication that an inference framework designed in this way is in any way “true”, or that it succeeds because it achieves some special intimate agreement with reality. Instead, the claim is pragmatic: the method succeeds to the extent that the inference framework works as designed and its performance will be deemed satisfactory as long as it leads to scientific models that are empirically adequate. Whatever design criteria are chosen, they are meant to be only provisional — just like everything else in science, there is no reason to consider them immune from further change and improvement.

(^1) This position bears a resemblance to the rejection of the fact/value dichotomy advocated in [Putnam 1991, 2003].

6 Inductive Inference and Physics

the impact on statistical mechanics itself, the obvious question is: Are there other examples? The answer is yes. Our goal in chapter 5 is to provide an explicit discussion of statistical me- chanics as an example of entropic inference; the chapter is devoted to discussing and clarifying the foundations of thermodynamics and statistical mechanics. The development is carried largely within the context of Jaynes’ MaxEnt for- malism and we show how several central topics such as the equal probability postulate, the second law of thermodynamics, irreversibility, reproducibility, and the Gibbs paradox can be considerably clarified when viewed from the in- formation/inference perspective. In chapters 9 and 10 we explore new territory. These chapters are devoted to deriving quantum theory as an example of entropic inference. The challenge is that the theory involves dynamics and time in a fundamental way. It is significant that the full framework of entropic inference derived in chapters 6 an 8 is needed here — the old entropic methods developed by Shannon and Jaynes are no longer sufficient. The payoff is considerable. A vast fraction of the quantum formalism is derived and the entropic approach offers new insights into many topics that are central to quantum theory: the interpretation of the wave function, the wave-particle duality, the quantum measurement problem, the introduction and interpretation of observables other than position, including momentum, the cor- responding uncertainty relations, and most important, it leads to a theory of entropic time. The overall conclusion is that the laws of quantum mechanics are not laws of nature; they are rules for processing information about nature.

Chapter 2

Probability

Our goal is to establish the theory of probability as the general theory for reasoning on the basis of incomplete information. This requires us to tackle two different problems. The first problem is to figure out how to achieve a quantitative description of a state of partial knowledge. Once this is settled we address the second problem of how to update from one state of knowledge to another when new information becomes available. Throughout we will assume that the subject matter – the set of propositions the truth of which we want to assess – has been clearly specified. This question of what it is that we are actually talking about is much less trivial than it might appear at first sight.^1 Nevertheless, it will not be discussed further. The first problem, that of describing or characterizing a state of partial knowledge, requires that we quantify the degree to which we believe each propo- sition in the set is true. The most basic feature of these beliefs is that they form an interconnected web that must be internally consistent. The idea is that in general the strengths of one’s beliefs in some propositions are constrained by one’s beliefs in other propositions; beliefs are not independent of each other. For example, the belief in the truth of a certain statement a is strongly constrained by the belief in the truth of its negation, not-a: the more I believe in one, the less I believe in the other. The second problem, that of updating from one consistent web of beliefs to another when new information becomes available, will be addressed for the special case that the information is in the form of data. The basic updating strategy reflects the conviction that what we learned in the past is valuable, that the web of beliefs should only be revised to the extent required by the data. We will see that this principle of minimal updating leads to the uniquely natural rule that is widely known as Bayes’ rule. (More general kinds of information can also be processed using the minimal updating principle but they require a more sophisticated tool, namely, relative entropy. This topic will be extensively

(^1) Consider the example of quantum mechanics: Are we talking about particles, or about experimental setups, or both? Are we talking about position variables, or about momenta, or both? Or neither? Is it the position of the particles or the position of the detectors?

2.1 The design of probability theory 9

The inference framework must be based on assumptions that have wide appeal and universal applicability.

Whatever guidelines we pick they must be of general applicability — otherwise they fail when most needed, namely, when not much is known about a problem. Different rational agents can reason about different topics, or about the same subject but on the basis of different information, and therefore they could hold different beliefs, but they must agree to follow the same rules. What we seek here are not the specific rules of inference that will apply to this or that specific instance; what we seek is to identify some few features that all instances of rational inference might have in common. The second criterion is that

The inference framework must not be self-refuting.

It may not be easy to identify criteria of rationality that are sufficiently general and precise. Perhaps we can settle for the more manageable goal of avoiding irrationality in those glaring cases where it is easily recognizable. And this is the approach we take: rather than providing a precise criterion of rationality to be carefully followed, we design a framework with the more modest goal of avoiding some forms of irrationality that are perhaps sufficiently obvious to command general agreement. The basic desire is that the web of rational beliefs must avoid inconsistencies. If a quantity can be inferred in two different ways the two ways must agree. As we shall see this requirement turns out to be extremely restrictive. Finally,

The inference framework must be useful in practice — it must allow quan- titative analysis.

Otherwise, why bother? Whatever specific design criteria are chosen, one thing must be clear: they are justified on purely pragmatic grounds and therefore they are meant to be only provisional. Rationality itself is not immune to change and improvement. Given some criteria of rationality we proceed to construct models of the world, or better, models that will help us deal with the world — predict, control, and explain the facts. The process of improving these models — better models are those that lead to more accurate predictions, more accurate control, and more lucid and encompassing explanations of more facts, not just the old facts but also of new and hopefully even unexpected facts — may eventually suggest improvements to the rationality criteria themselves. Better rationality leads to better models which leads to better rationality and so on. The method of science is not independent from the contents of science.

2.1.2 Quantifying rational belief

In order to be useful we require an inference framework that allows quantitative reasoning. The first obvious question concerns the type of quantity that will

10 Probability

represent the intensity of beliefs. Discrete categorical variables are not adequate for a theory of general applicability; we need a much more refined scheme. Do we believe proposition a more or less than proposition b? Are we even justified in comparing propositions a and b? The problem with propositions is not that they cannot be compared but rather that the comparison can be carried out in too many different ways. We can classify propositions according to the degree we believe they are true, their plausibility; or according to the degree that we desire them to be true, their utility; or according to the degree that they happen to bear on a particular issue at hand, their relevance. We can even compare propositions with respect to the minimal number of bits that are required to state them, their description length. The detailed nature of our relations to propositions is too complex to be captured by a single real number. What we claim is that a single real number is sufficient to measure one specific feature, the sheer intensity of rational belief. This should not be too controversial because it amounts to a tautology: an “intensity” is precisely the type of quantity that admits no more qualifications than that of being more intense or less intense; it is captured by a single real number. However, some preconception about our subject is unavoidable; we need some rough notion that a belief is not the same thing as a desire. But how can we know that we have captured pure belief and not belief contaminated with some hidden desire or something else? Strictly we can’t. We hope that our mathematical description captures a sufficiently purified notion of rational belief, and we can claim success only to the extent that the formalism proves to be useful. The inference framework will capture two intuitions about rational beliefs. First, we take it to be a defining feature of the intensity of rational beliefs that if a is more believable than b, and b more than c, then a is more believable than c. Such transitive rankings can be implemented using real numbers we are again led to claim that

Degrees of rational belief (or, as we shall later call them, probabilities) are represented by real numbers.

Before we proceed further we need to establish some notation. The following choice is standard.

Notation

For every proposition a there exists its negation not-a, which will be denoted ˜a. If a is true, then ˜a is false and vice versa. Given any two propositions a and b the conjunction “a and b” is denoted ab or a ∧ b. The conjunction is true if and only if both a and b are true. Given a and b the disjunction “a or b” is denoted by a ∨ b or (less often) by a + b. The disjunction is true when either a or b or both are true; it is false when both a and b are false. Typically we want to quantify the degree of belief in a ∨ b and in ab in the context of some background information expressed in terms of some proposition