A Savage-Like Axiomatization for Nonstandard Expected Utility

Since Leonard Savage’s epoch-making “Foundations of Statistics” [Savage, 1972], Subjective Expected Utility Theory has been the presumptive model for decision-making. Savage provided an act-based axiomatization of standard expected utility theory. In this article, we provide a Savage-like axiomatization of nonstandard expected utility theory. It corresponds to a weakening of Savage’s 6th axiom.

In the last twenty years, there has been an explosion of research in decision theory. Various decision-making procedures have been proposed as descriptive or normative alternatives to expected utility theory, such as info-gap theory [Ben-Haim, 2006], Choquet expected utility [Asano and Kojima, 2015, Chateauneuf et al., 2003], and qualitative binary possibilistic utility [Giang and Shenoy, 2005, Weng, 2006]. Another approach is to refine expected utility theory by allowing utility functions and probability measures to contain infinitesimal or infinite elements. This refinement permits coherent modeling of Pascal’s Wager [Herzberg, 2011], and allows uncountably many pairwise disjoint events to have nonzero probability, circumnavigating some of the troubles of standard expected utility theory [Hammond, 1999]. It has also been defended on game-theoretic grounds [Hammond, 1994].

One of the most popular methods for modeling infinitesimals is Abraham Robinson’s nonstandard analysis, which enriches the field of real numbers with infinitely small and infinitely large numbers in a way that sensibly preserves the field’s underlying structure [Goldblatt, 1998, Robinson, 1996]. This extension of the reals is termed the hyperreal numbers and denoted by ∗R.

We axiomatize a decision-theoretic model for agents who generally agree with Savage’s postulates but are unwilling to gamble extremely valuable goods against relatively unimportant ones at even the slimmest of odds. Our theorem clarifies the postulates a person would need to accept in order to embrace nonstandard expected utility; this highlights the gap between Bayesians and decision theorists in the school of Ellsburg [Ellsberg, 2016].

A variety of perspectives stress the advantages of working with nonstandard probability theory rather than its more granular standard counterpart. It is natural to ask what advantages the nonstandard approach has for other decision theories. Many of these theories have been axiomatized in a Savage-like manner [Sarin and Wakker, 1992, Weng, 2013]), and these axiomatizations may be adapted to a nonstandard setting by the transfer principle, theorem 3.7. We leave this question for future researchers to investigate.

We begin our paper with two motivating examples. We proceed with a brief overview of relevant constructions and theorems from nonstandard analysis. We restate Savage’s Theorem for reference, and close by stating and proving our own theorem.

Suppose a friend of yours offers you the following bet. You will roll a fair six-sided die. If the die lands on its edge, you must pay your friend $1000. Otherwise, nothing happens. You clearly should reject this bet, because it is impossible for playing to make you better off than refusing. You are at risk for losing $1000, and there is no possible reward for taking the bet. On the other hand, the probability of a fair six-sided die landing on one of its edges is 0, and so an adherent of expected utility theory would be indifferent about taking this bet. Expected utility theory has been hailed as a normative theory for decision-making [Bernoulli, 1954, Fishburn, 1970, Jallais et al., 2008, Neumann and Morgenstern, 1953, Savage, 1972], so this is a serious quandary.

This paradox occurs because in standard probability theory, an event may be completely possible and yet have probability 0. Under such circumstances, even monumental awards or penalties will be ignored in an expected utility calculation. In order to resolve this difficulty, it is natural to refine expected utility theory so that it can correctly interact with events of negligible probability, or with outcomes of superlative significance. Once we formalize such a refinement, we may ask what postulates govern this refined expected utility theory. Our theorem 5.1 definitively answers this question for nonstandard expected utility theory.

Suppose now that you are an economist considering whether it is normatively correct to obey Savage’s seven postulates for rational decision-making (see theorem 4.5). You agree that if you had time to consider every conceivable option, you would have weakly ordered preferences (S1), and your preference between two decisions should depend only on those cases where their outcomes differ (S2). You agree that your preferences between outcomes should be state-independent (S3), and that whether you prefer to bet on one event than another should not depend on exactly what prize you would earn (S4). You don’t mind excluding trivial decision-making (S5), and you agree that if you prefer every possible outcome of one decision to another one considered holistically, you should prefer the former decision to the latter(S7).

However, you are uncomfortable agreeing that you would be willing to bet your life against a penny, at any odds. In general, you are unwilling, when faced with two decisions, to finitely partition your state space so that what happens on any given partition will not reverse your preferences (S6). But you would be willing to risk catastrophe to claim a small reward if the odds in your favor were literally infinite. You wonder what decision-theoretic systems could correctly model your preferences. This curiosity might be academic, it might be economic, or it might be personal. In any case, theorem 5.1 answers this question as well.

We survey some elementary constructions and theorems from nonstandard analysis. No proofs are provided; our exposition is taken directly from chapters 2 through 4 of [Goldblatt, 1998], so curious readers are encouraged to study further there.

Definition 3.1. A nonprincipal ultrafilter on  N is a set U ⊆ 2N such that

1. For all  A, B ∈ U we have A ∩ B ∈ U (filter)2. If A  ∈ U and A ⊆ B ⊆ N, then B ∈ U (filter)

3. ∅ /∈ U (ultrafilter)

4. For all  A ⊆ N, we have A ∈ U or N − A ∈ U (ultrafilter)5. For all  a ∈ N,there exists  A ∈ U such that a /∈ A (nonprincipal)

The existence of nonprincipal ultrafilters is guaranteed by Zorn’s lemma.

Observation 3.2. Every nonprinicipal filter is cofinite: if  A ⊆ Nhas finite cardinality, then N − A ∈ U.

Definition 3.3. Fix U a nonprincipal ultrafilter. For S an arbitrary set, let ∗S = SN/ ∼,where (s1, s2, . . .) ∼ (t1, t2, . . .) if {n ∈ N : sn = tn} ∈ U.

Observation 3.4. The relation  ∼is an equivalence relation on  SN. We identify S with its image in ∗Sunder the diagonal embedding  s �→ (s, s, s, . . .).

For ∗s, ∗t ∈ ∗S, let (s1, s2, . . .) be a representative of ∗s and (t1, t2, . . .) be a representative of t. We extend a relation  R on S to ∗R on ∗S by saying ∗s ∗R ∗t if {n ∈ N : snRtn} ∈ U.Similarly, we extend  f : S → T to ∗f : ∗S → ∗Tby letting ∗f(∗s) ∈ ∗Tbe the equivalence class containing (f(s1), f(s1), . . .). Similar extensions can be made of binary operations, Cartesian products, fibered products, and other set-theoretic constructions. All are well-defined.

Example 3.5. ∗R = RN/ ∼is a totally ordered field which contains infinitesimal elements. In other words, we can find  ǫ ∈ ∗R such that 0 < ǫ < 1n for every natural number n. Note that 1ǫ > nfor every natural number  n, so ∗Rcontains infinite elements as well.


Note that a function may be hyperbounded without being bounded.


Proof. Let (n1, n2, . . .) be a representative for ∗n. The set


is at most countable, then  |A/ ∼ | ≤ |A| ≤ |N|. On the other hand, let ∗n ∈ ∗N be theelement containing (1, 2, . . .). For all k ∈ N, we see that  {m ∈ N : k ≥ m}is finite, so ∗n ≥ kfor all  k ∈ N. Then N ⊆ {∗k ≤ ∗n} and thus |{∗k ≤ ∗n}| = |N|. □

One of the most important theorems in nonstandard analysis is the transfer principle, which asserts that S and its nonstandard extension ∗Shave essentially the same structure.

Theorem 3.7 (Transfer Principle). Let R be a relational structure,  LRthe mathematical language of R, comprising the relation and function symbols of R together with logical connectives, existential quantifiers, and parentheses. A defined  LR sentence φis true if and only if ∗φ is true.

Roughly what this means is that any statement  φabout a mathematical object S is true if and only if its nonstandard analogue ∗φis true. Here ∗φis obtained from  φby extending the objects, relations, and functions in  φ to ∗S.

As the Transfer Principle suggests, it is possible to develop nonstandard analysis on purely model-theoretic grounds, without any reference to ultrafilters. However, we will not pursue this train of thought here (but see [Robinson, 1996]).

Leonard Savage proved that any decision-maker who complied with seven plausible rationality axioms behaved as if he followed expected utility theory. We reproduce his axioms and theorem here. Readers interested in a more thorough treatment are directed to [Fishburn, 1970] and [Savage, 1972].

First, we provide some definitions. The state space S is the collection of all possible states of the world. We assume the collection S is exhaustive and mutually exclusive, so the world is identified with exactly one element of S. We take X to be space of possible outcomes, or results of the actor’s choice. Let  D = SX denote the space of conceivable decisions that the actor could make. In practice, an actor will normally only be able to choose from a small subset of D, but Savage considered it reasonable to assume a normatively rational actor would have the capacity to compare any two hypothetical decisions. This is not a restrictive assumption, as we may enrich the decision space with additional options without forcing any changes to the actor’s original preferences. We also assume the actor’s menu of choices does not alter  s ∈ Sthe state of the world, and neither does the particular decision  d ∈ D hemakes.

The actor has preferences between possible decisions, described by the binary relation “≻.” We read f ≻ g as “fis preferred to g.” From this primitive binary relation, we may define  f ≺ gto hold if  g ≻ f, f ∼ g if f ⊁ g and g ⊁ f, and f ⪰ g if f ≻ g or f ∼ g. Forx, y ∈ X, we also say  x ≻ yif the constant decision returning x is preferred to the constant decision returning y. At the outset, we make no assumptions about the structure of this preference relation.

We employ the following definitions.


Definition 4.1. For f, g ∈ D and A ⊆ S, we define fAg by fAg(s) =


Suppose that you prefer outcome x to outcome y, and you believe that event A is more likely than event B. Obtaining x if event A occurs and y otherwise is clearly preferable to obtaining x if event B occurs and y otherwise.

Definition 4.2. For A, B ⊆ S, we write A ≻L Bif whenever  x ≻ y, we have xAy ≻ xBy.We read  A ≻L B as “Ais more likely than  B.”

Now suppose you are certain event A occurs. Then when choosing between decisions, what would happen if A did not occur is irrelevant.

Definition 4.3. For A ⊆ S, we write (f ≻ g)Aif for all  h ∈ D, we have fAh ≻ gAh. Weread (f ≻ g)A as “fis preferred to  g given A.”

If an event A is impossible, then what would happen if it occurred is always irrelevant.


Then there is a unique finitely additive probability measure  P : 2S → [0, 1]such that for all



Proof. Theorem 14.1 of [Fishburn, 1970]. □

Theorem 5.1. With all the notation of Savage’s Theorem (4.5), let  ≻satisfy axioms S1, S2, S3, S4, S5, S7, and the following modification of S6: S6 ′. For any x ∈ X, if f ≻ gthen there exists a countable partition  {Ek}∞k=1 of S suchthat for  k ∈ N we have fEkx ≻ g, and f ≻ gEkx.Then there is a unique countably additive nonstandard probability measure ∗P : 2S → ∗[0, 1]such that for all  A, B ⊆ S, we have


Proof. Apply the transfer principle to Savage’s Theorem. For axioms S1-S5 and S7, this relabels  S as ∗S and X as ∗X. Axiom S6becomes “For any  x ∈ X, if f ≻ g, there exists n ∈ ∗Nand a partition  {E∗k}∗k≤∗n of Ssuch that for all ∗k ∈ ∗N with ∗k ≤ ∗n we havefE∗kx ≻ g, and f ≻ gE∗kx.” But for any given ∗n ∈ ∗N, there are at most countably many ∗k ≤ ∗n. Refining our partition  {E∗k}∗kif necessary, we may take  {E∗k}∗k≤∗n to be acountable partition. Then there exists a bijection  φ : {∗k ∈ ∗N : ∗k ≤ ∗n} → N, so we may relabel  {E∗k}∗k≤∗n as {Eφ(∗k)}∗k≤∗n = {Ek}k∈N.

Observe also that as S and X have no assumed structure, neither do ∗S or ∗X. Then we can write  S for ∗S and X for ∗X. Applying the transfer principle to the assertion that for all  n ∈ Nand  {Ak}nk=1 pairwise disjoint measurable sets in  S, we have P(�nk=1 Ak) = �nk=1 P(Ak), wesee that for  {Ak}∞k=1 disjoint measurable sets in  S, we have ∗P(�∞k=1 Ak) = �∞k=1∗P(Ak).Thus ∗Pis countably additive. □

Corollary 5.2. Suppose ≻ on Dsatisfies conditions S1-S5, S6 ′, S7 above. Then we may take ∗uto be bounded.


positive affine transformation of ∗uwhich is bounded. □

