Thursday, 27 June 2013

Is Space the Form of External Intuition?

First, a bit of cognitive psychology. Then, how physics views the question whether space is Euclidean and 3-dimensional. Finally, I argue that modern psychology and physics together imply a conclusion that contradicts the central assumption of Kant's Transcendental Idealism -- namely, Kant's claim that space is the form of external intuition.

Space, the regions that physical things like rocks and chairs move around in, the arena of spacetime regions and spacetime points, to which may be assigned co-ordinates, upon which physical fields, tensor fields, etc., are defined, is mentally represented in the human cognitive system, and presumably this is quite similar to how it works in the primates, and all creatures with anything like a simple visual system (even bees and flies, say). Now we mentally represent (in perception, I mean) space as 3-dimensional and Euclidean. Only three (and no more) co-ordinate axes can be placed perpendicularly to each other. Presumably, the perceptual system in humans, primates, bees, etc., is innate. And presumably this is why we tend to consider its output as a priori. Kant uses the phrase "form of external intuition" to refer to this mental representation of space (or, if you prefer, to its organizing pattern). So, using Kant's terminology, the form of external intuition is 3-dimensional Euclidean geometry, $\mathbb{E}^3$.

It is very unclear how this works, of course. But let us take this also as something outputed by modern cognitive psychology:
(1) The form of external intuition is $\mathbb{E}^3$. (Psychology)
Next, turning to space -- i.e., the regions that physical things like rocks and chairs move around in, the arena of spacetime regions and spacetime points, to which may be assigned co-ordinates, upon which physical fields, tensor fields, etc., are defined, and so on. This is not a definition. On the contrary! For space (or spacetime points, events, etc.) is assumed as a primitive in modern physics (classical mechanics, electromagnetism, special relativity, general relativity, quantum theory; various quantum gravity programmes such as superstring theory, canonical quantification, supergravity, loop gravity, causal set theory, etc.).

In a physical theory, we have $M$ (i.e., space) assumed as primitive. Then there are various functions and whatnot on $M$ (word-lines, fields, fibres, and all sorts of weird stuff, etc). In fact, space and time, have, since Einstein and Minkowski, been fused together, and usually, $M$ has the structure of a Riemannian manifold, with a special metric $g_{\mu \nu}$ which tells you how "far apart" neighbouring points are; but it needn't have this structure. Physicists are very unsure what properties space has. For example, space might be a manifold of some kind -- perhaps a compactified 10-dimensional manifold, an idea that goes back to Kaluza-Klein theories. Space might be something finite and/or discrete (such as in causal set theory). Or perhaps something quite different.

However, physicists agree (this is called The Correspondence Principle, and is why, e.g., Einstein aimed to get Newton's Laws as approximations from his field equations) that any theory of space will recover the 3-dimensional Euclidean space $\mathbb{E}^3$ as an approximation. But this approximation is, to repeat, an approximation. So, space, whatever it is, is approximately $\mathbb{E}^3$ "at a certain scale". That's what we "observe" at the medium-scale. But this does not imply that space is $\mathbb{E}^3$. In fact, it isn't, on any modern theory.

Let us therefore take this as something outputted by modern physics:
(2) Space is not $\mathbb{E}^3$. (Physics)
Taking together, these two statements (1) and (2) imply:
(3) Space is not the form of external intuition. (Physics, Psychology)
Move now to Kant and his argument for Transcendental Idealism (TI), which is Kant's claim that,
Time and space, and all objects of a possible experience, cannot exist out of and apart from the mind.
(See Kant, Critique of Pure Reason, Transcendental Logic, Second Division: Transcendental Dialectic, BOOK II: The Dialectical Inferences Of Pure Reason, Chapter II: THE ANTINOMY OF PURE REASON, SECTION VI. Transcendental Idealism as the Key to the Solution of Pure Cosmological Dialectic)

Kant's argument for (TI) it is based on his assumption that space is the form of external intuition. More exactly, his argument for (TI) uses the assumptions:
(A1) Space is the form of external intuition.
(A2) Space (and time) is necessary for the representation of objects (of a possible experience).
(A3) External intuition is a property of the mind.
So, in particular, Kant's argument for (TI) is based on (A1).

But the problem is that Kant's assumption (A1) contradicts (3).

Master in Logic and Philosophy of Science at the LMU Munich

Master in Logic and Philosophy of Science at the LMU Munich:
http://www.mcmp.philosophie.uni-muenchen.de/students/ma/index.html

The Munich Center for Mathematical Philosophy at Ludwig-Maximilians-Universität (LMU) Munich is now accepting applications for the MA program in Logic and Philosophy of Science. The official language of the program is English. Masters students are trained in all areas of philosophy, but the distinctive feature of the program is that many students also investigate philosophical issues at the foundations of mathematics, computer science, statistics, and the empirical sciences (including physics, linguistics, neuroscience, and more).

LMU has a long-standing tradition in logic and philosophy of science. Recently, this tradition has been cemented by the foundation of the Munich Center for Mathematical Philosophy (MCMP). The MCMP is led by the internationally-renowned philosophers Hannes Leitgeb and Stephan Hartmann. Overall, the Faculty of Philosophy, Philosophy of Science and Study of Religion at LMU is one of the largest in the German-speaking world.

More Information is available here: http://www.mcmp.philosophie.uni-muenchen.de/index.html

- Fees and financial support: There are no tuition fees, but each student must pay a one-time 111 Euro for admission. Some students will receive a tax-free, monthly stipend of 1000 Euro for one year; scholarships are awarded on the basis of academic record. The possibility of extension for another year is subject to positive review after the first year. If you are interested in applying for an entry grant please say so in your cover letter.

- Application: We will accept applications until July 15th, 2013, but we strongly encourage early applications. Applications will be considered as they are received. Please submit your application electronically to Mcmp.Masters@lrz.uni-muenchen.de with "Application MA in Logic and Philosophy of Science" as the subject.

Your application package should include:
(1) A cover letter explaining why you would like to join the program. If you wish to apply for an entry grant, please say so in the letter.
(2) A CV (including the list of courses that you have taken as an undergraduate).
(3) A copy of your undergraduate final grade certificate or transcript.
(4) A seminar paper or published article that you have written, the subject of which can vary.
(5) Two letters of reference, which should also address your abilities in and knowledge of the areas that are covered by our MA program. These letters can also be sent directly by the referees to Mcmp.Masters@lrz.uni-muenchen.de. 
(6) Proof of English language proficiency (TOEFL, IELTS or equivalent), in case your first language is not English and you do not have a degree from an English-speaking university.
If necessary, applicants may also be interviewed. If you have questions about the application process, please contact the program coordinators at Mcmp.Masters@lrz.uni-muenchen.de.
Second, please simultaneously apply to the International Affairs Office at LMU, which will check for general requirements that concern masters studies at LMU. At http://www.en.uni-muenchen.de/students/degree/downloads2/index.html, you can download all application documents for Master degrees. There you will also find information about which documents to submit to the International Affairs Office, how to submit them, and whom you can contact about this other part of the application process in case of questions.

Wednesday, 26 June 2013

The Hole Argument Against Worlds with Domains

More or less every view about the nature of possible worlds that I'm familiar with develops a theory which assigns to each world $w$ a domain $D_w$ of objects which "exist" at $w$. I want to argue that every such view faces a Hole Argument analogous to the one that appears in debates about space and time.

Let $\mathsf{P}_1$ and $\mathsf{P}_2$ be two properties. For our purposes, it doesn't matter what they are. There might be 97 properties, along with 34 relations. But that would complicate the discussion needlessly. For definiteness and vividness, let the properties be $\mathsf{Red}$ and $\mathsf{Green}$. (Again, you might want to consider some fancy quantum properties, but nothing hinges on this.)

Let us suppose, with standard metaphysics of worlds, that we consider a world $w$ with a domain,
$D_w = \{a,b,c\}$
such that,
$\mathsf{Red}(a) \wedge  \neg \mathsf{Red}(b) \wedge  \neg \mathsf{Red}(c)$
$\mathsf{Green}(a) \wedge  \mathsf{Green}(b) \wedge  \neg \mathsf{Green}(c)$
It follows that the extensions of $\mathsf{Red}$ and $\mathsf{Green}$ are:
$\mathsf{Red}^{w} = \{a\}$
$\mathsf{Green}^{w} = \{a,b\}$
Let us now permute the domain under the bijection
$\pi: D_w \to D_w$
given by:
$\pi(a) = b$
$\pi(b) = c$
$\pi(c) = a$
And let us apply this permutation $\pi$ to the extensions above, obtaining:
$\pi[\mathsf{Red}^{w}] = \{b\}$
$\pi[\mathsf{Green}^{w}] = \{b,c\}$
Now, we still have the same domain $D_w$, but we now have distinct extensions.
Question: is the result a new distinct world? Or ...?
On the standard view, the result is a new distinct world $w^{\prime} = w^{\pi}$ with the same domain $D_w$ but such that,
$\mathsf{Red}^{w^{\prime}} = \{b\}$
$\mathsf{Green}^{w^{\prime}} = \{b,c\}$
But the problem is that this violates anti-haecceitism. For $w$ and $w^{\prime}$ are isomorphic (under $\pi$, by construction), and yet extensionally distinct. Anti-haecceitism (along with Leibniz Equivalence in space-time physics) states that somehow $w$ and $w^{\prime}$ should be the same world.

Suppose one accepts anti-haeccetism. Then, there seem to me to be only two ways out of this problem:
(i) Keep the domain-based notion of worlds, and deny that the world $w^{\prime}$ exists.
(ii) Throw away the domain, and insist that we have two descriptions of the same world.
The first seems to me to be more or less Lewis's view. But I think the second view is much more attractive, and it fits very naturally with physics. On the view I prefer, this world $w$ is one at which there are exactly 3 concreta $x_1, x_2, x_3$ and such that
$\mathsf{Red}(x_1) \wedge  \neg \mathsf{Red}(x_2) \wedge  \neg \mathsf{Red}(x_3)$
$\mathsf{Green}(x_1) \wedge  \mathsf{Green}(x_2) \wedge  \neg \mathsf{Green}(x_3)$
But there is no special domain for $w$. And the attempt to label the "objects" is really a form of skolemization. One can skolemize the description of the world, if one wants; and one can take the constants as the domain elements. (Alternatively one can consider a model $\mathcal{A}$ of the world $w$.) But this is no longer the world $w$, but a representation of the world $w$.

Epistemic Interpretation of Ultra-Finitism

In the previous post on ultra-finitism, I tried out four different interpretations of ultra-finitism (by "numbers" I mean elements of $\mathbb{N}$; or of $\omega$, if you prefer):
  1. The numbers "run out", at some finite point.
  2. Numbers are physical objects.
  3. There are no numbers (i.e., nominalism).
  4. There is a number such that it is not concretely realized.
All of these are ontological views, saying how the numbers are, or how they're related to other things, etc.  But I think the first simply rests on changing the meaning of "number" from "element of $\mathbb{N}$" to "element of finite $A \subseteq \mathbb{N}$", and it is easy to define modifications of the axioms of arithmetic (allowing the successor, addition and multiplication functions to be "partial") and models of such systems with domain $A \subseteq \mathbb{N}$, with $A$ finite; the second seems to me to be a confusion (you can't purchase the number 57 at Tesco's or put $(\omega, <)$ on an optical bench); the third is fine, and for all I know, true -- but it is simply nominalism (and has a huge research literature devoted to it).

The last is the most plausible ontological view, and merely says that beyond a certain level, no larger numbers are concretely realized. But note that this view is perfectly compatible with the existence of $\mathbb{N}$, and in fact with the existence of much, much more. It may well be true, a physical fact, that few (out of the transfinitely many) abstracta are concretely realized, because of the finiteness of the physical world. But this is not a particularly sceptical view. In fact, it is rather like Plato's view.

Still, these doctrines (1)-(4) make no mention of epistemic matters: proof, evidence, justification, etc. One can think of ultra-finitism as an epistemological view, rather similar to positivism's view of how the world is known (i.e., by direct contact with sense experience). We can introduce an epistemic element to this by the following epistemological doctrine, which is a form of constructivism:
Token Cognizability (TC)
A justificaton for asserting the existence of a number $n$ is a token construction of $n$.
Ordinary finitists (who accept the numbers as a potential infinity and the computational operations on them), and constructivists more liberally, accept the modal notion of a possible construction. For example, a formula $\phi$ is provable if it could be proven (even if, for practical reasons, it cannot in practice). So, one can change the modality used to define finitism to a much much stronger one, meaning roughly, "can in practice", and thereby get ultra-finitism.

So that's the idea. In more detail, frequently large numbers are defined by function terms (usually arithmetic), $t$. Examples of function terms might be:
$sssss\underline{0}$
$5 \times (29 + 2)$
$2^{1000}$
$2^{2^{2^{2^{2}}}}$
But a function term $t$ is a (syntactically) complex expression, and has not "directly informed" you what number it denotes, what its value, $val(t)$ is. In formalized systems of arithmetic, such as $Q$, $I \Sigma_1$, $PA$, etc., there is a canonical means of referring to numbers. These are the canonical numerals:
$\underline{0}$
$s\underline{0}$
$ss\underline{0}$
$sss\underline{0}$
and so on
(Notations vary. Sometimes people, including me, use a prime notation for successors, e.g., $\underline{0}^{\prime \prime \prime}$, instead.)

These can be defined by a primitive recursion $n \mapsto \underline{n}$ by:
$\underline{0} := \underline{0}$
$\underline{n+1} : = s\underline{n}$
If $F$ is a system of formalized arithmetic, then usually, for any function term $t$, the system proves an equation $t = \underline{n}$. where $n = val(t)$. So, one may then say:
A construction of $n$ is a canonical numeral for $n$
A reduction of $t$ is a proof of an equation $t = \underline{n}$, for some $n$. 
Finally, numerals, constructions, reductions and proofs, thus defined, are abstract objects. A numeral is a finite sequence. A construction is a finite sequence. And a finite sequence on $A$ is a function $\sigma: I \to A$, where $I < \omega$ is a finite initial segment of the ordinals ($I$ is the index set), we can then write:
$\sigma = (a_0, a_1, \dots) = (a_i \mid i \in I)$. 
Now the canonical numeral for the number $n$ has size, or length, roughly equal to $n$ itself. And a reduction for a term $t$ will have size at least as large as its value $val(t)$.

But despite being abstract entities, these numerals, constructions and reduction sometimes have physical, or concrete, tokens. The tokens are entities produced by cognition, and intended to be tokens of the relevant numerals or proofs. So, we can define:
A token construction of $n$ is a token of a canonical numeral for $n$
A token reduction of $t$ is a token of a proof of an equation $t = \underline{n}$, for some $n$. 
Then, we can note that the actual world satisfies the following condition:
(FIN) There are terms for which there is no (actual) token reduction.
(To be more exact, this is so unless we allow rather strange entities to count as "tokens", where these tokens are unintended. E.g., random patterns in the sand, or peculiar tiny regions of spacetime, shaped like very long sequences of "S"s, that no one has ever seen.)

It then follows from this, along with Token Cognizablity, that:
There are terms for which there is no justification for asserting the existence of their value.
This explains the view of many ultra-finitists that we should be sceptical of very large (finite) numbers. So, ultra-finitism is now an epistemological view, based primarily on the two assumptions:
Token Cognizability (TC): A justificaton for asserting the existence of a number $n$ is a token construction of $n$.
Finiteness (FIN): There are terms for which there is no (actual) token reduction.
A response to this is to request a justification for the assumptions required here for this argument to go through: in particular, the epistemological doctrine of Token Cognizability. Why should it be true that a reason for asserting the existence of a number must involve a token construction of it?

Why can one not have indirect means for asserting the existence of the numbers? We have indirect means for asserting the existence of electrons and quasars and long-dead dinosaurs. Why can we not have indirect means for asserting the existence of (all) the numbers, and the completed set $\mathbb{N}$ of them all?

Contact Epistemology: Causal vs Representational

What I'll call "Contact Epistemology" is the epistemological view which imposes a necessary constraint on an agent's having knowledge (of some putative kind of entity), something along the lines of this:
(CE) An agent knows about $F$s only if the agent's mind is in contact with the (or some) $F$s.
Many philosophers have insisted on something like this, and required that the "contact" between the agent's mind and $F$s should be causal contact: this kind of constraint is widely assumed amongst naturalistic epistemologists. But it is what leads to so many ...
Access Problems in Epistemology
(Mathematics) If we are not causally connected to abstract entities, how can we know about them?
(Morality) If we are not causally connected to moral properties & states of affairs, how can we know about them?
(Modality) If we are not causally connected to merely possible worlds, how can we know about them?
But it seems to me that there are reasonable alternatives to this. For one might instead simply require that the agent's mind be in representational contact, rather than causal contact. For example, for my cognitive system to know about:
  • the future, 
  • or the distant past,
  • or not directly (e.g., visually) observable physical entities,
  • or moral properties, 
  • or abstract entities, 
  • or merely possible states of affairs, 
it might not be necessary that my cognitive system be causally connected to them (the future, distant past, unobservables, or moral properties, or abstract entities, or merely possible states of affairs), but it might be necessary that my mind represents these.

If so, contact epistemology may be disambiguated into at least two very different views about knowledge:
Causal Contact Epistemology
An agent knows about $F$s only if the agent's mind is in causal contact with the (or some) $F$s.
and:
Representational Contact Epistemology
An agent knows about $F$s only if the agent's mind is in representational contact with the (or some) $F$s. 

Tuesday, 25 June 2013

Kant's Argument for Transcendental Idealism

Kant's Transcendental Idealism is the metaphysical view which he states as follows:
... all objects of a possible experience ... have no self-subsistent existence apart from human thought.
Time and space, with all phenomena therein ... cannot exist out of and apart from the mind.
I formulate this as:
Time and space, and all objects of a possible experience, cannot exist out of and apart from the mind.
How does Kant arrive at this conclusion, that space, and time, and rocks, and trees, and quasars and so on, "cannot exist out of and apart from the mind"?

Here I quote from my electronic copy of Kant, Critique of Pure Reason (1781/1787) (tr., J. M. D. Meiklejohn). (If you find Kant's organization of CPR very confusing and Byzantine, don't worry: so, do I. Here it is online.) I highlight in bold what seem to be the central claims:
Transcendental Logic, Second Division: Transcendental Dialectic
BOOK II: The Dialectical Inferences Of Pure Reason
Chapter II: THE ANTINOMY OF PURE REASON
SECTION VI. Transcendental Idealism as the Key to the Solution of Pure Cosmological Dialectic.
In the transcendental aesthetic we proved that everything intuited in space and time, all objects of a possible experience, are nothing but phenomena, that is, mere representations; and that these, as presented to us—-as extended bodies, or as series of changes—-have no self-subsistent existence apart from human thought. This doctrine I call Transcendental Idealism. ...
Transcendental idealism allows that the objects of external intuition—-as intuited in space, and all changes in time-—as represented by the internal sense, are real. For, as space is the form of that intuition which we call external, and, without objects in space, no empirical representation could be given us, we can and ought to regard extended bodies in it as real. The case is the same with representations in time. But time and space, with all phenomena therein, are not in themselves things. They are nothing but representations and cannot exist out of and apart from the mind.
So, if I follow Kant correctly, here is Kant's argument for (TI):
Assumption 1: Space is the form of external intuition.
Assumption 2: Space (and time) is necessary for the representation of objects (of a possible experience).
Assumption 3: External intuition is a property of the mind.

Claim (TI):
Space, and all objects of a possible experience, cannot exist out of and apart from the mind.
[I ignore the time part of the claim, as it simply seems to make the argument more complicated without adding anything new.]

Proof: By Assumption 1, space is the form of external intuition. But, by Assumption 3, external intuition, being a property of mind, cannot exist independently of human thought. Thus space cannot exist out of and apart from the mind. By Assumption 2, space (and time) is necessary for the representation of objects. So, objects of a possible experience cannot exist independently of external intuition, and, a fortiori, cannot exist out of and apart from the mind. QED.

This argument is not a precise, valid, formal argument. Still, it seems as close to being informally valid as one might reasonably request of a philosophical argument.

[UPDATE: 1 July. I've separated out the two important quotes at the start, and formulated (TI) based on both.]

Monday, 24 June 2013

Concrete Realization and Ultra-Finitism

Suppose $w$ is a possible world at which there are exactly $15$ concreta instantiating some causal connectedness relation. Let $I = \{1, \dots, 15\}$, Let $\mathsf{Conc}$ be the property of being concrete and let $\mathsf{Causconn}$ be the relation of being causally connected.

Then, the categorical diagram description of the world $w$ looks like this:
$\exists x_1, \dots, x_{15}$
$[\bigwedge \{x_i \neq x_j \mid i \neq j \in I\} \wedge $
$\forall z(\mathsf{Conc}(z) \to \bigvee \{z = x_i \mid i \in I\}) \wedge $
$\bigwedge \{\mathsf{Conc}(x_i) \mid i \in I\} \wedge$
$\neg \mathsf{Causconn}(x_1,x_1) \wedge \mathsf{Causconn}(x_1,x_2) \wedge
\dots]$.
(where the final clause says how the causal connectedness relation $\mathsf{Causconn}$ is instantiated in $w$. The details don't matter.)

Let $\kappa$ be a cardinal. Define:
$\mathsf{Real}_C(\kappa)$ := $\exists X(\forall x(x \in X \to \mathsf{Conc}(x)) \wedge \kappa = |X|)$
This means:
"$\kappa$ is concretely realized by some set of concreta".
Then,
$w \models \mathsf{Real}_C(\kappa)$ if and only if $0 \leq \kappa \leq 15$.
That is, every number up to $15$ is concretely realized (in $w$).

The reason for setting things up like this (modally) is to try and define Ultra-Finitism as charitably as possible, so that it doesn't sound bonkers.

One interpretation of ultra-finitism is that, somehow, the natural numbers "run out", at some point. I can't make much sense of it. If successor, $s : \mathbb{N} \to \mathbb{N}$, is a total injective function with $0$ outside its range, there must be infinitely many successors of $0$. Perhaps the idea is that successor is "really" non-total? One can easily define such models. (E.g,  "arithmetic with a top", as discussed in work on Bounded Arithmetic; e.g., Hájek & Pudlák, 1993, Metamathematics of First-Order Arithmetic, Ch IV, Sc 2; and in this 2002 paper by Neil Thapen) But really, that is not what we mean by the natural numbers.

A second interpretation is that ultra-finitism is the view that numbers are physical objects. And some ultra-finitists talk as if they do hold these strange beliefs. But surely everyone knows that numbers aren't things that move around, have mass, etc.! Numeral tokens yes. But numbers?

A third interpretation of ultra-finitism is simply that it is the view that there are no numbers (i.e., nominalism), but that there are numeral tokens (which are physical things). But this is not really a finitist position (for mathematics) at all, except in a trivial sense (setting the number of numbers as 0). It's nominalism and nominalism has a very large research literature devoted to it. If ultra-finitists are nominalists, then the solution to their troubles is easy: go to a library and study the literature!

A fourth, and what seems to me probably the only conceptually stable view, is that ultra-finitism is the view that only strictly finitely many numbers are realized concretely. This is by no means crazy. If that is the correct view, then ultra-finitism (at size $n \in \omega$)---denoted $UF_n$---can now be defined as follows:
$UF_n := $ for all $w$, $w \not\models \mathsf{Real}_C(n)$
which means: no world $w$ concretely realizes the number $n$.

Friedman on Material Set Theory (ZFC)

Here I repost a message from Harvey Friedman from the FOM mailing list, from 1997. (FOM is freely accessible, so I hope I don't violate any republication conventions here! The original message from Friedman is online here.)

It is connected to one of the criteria I mentioned before, namely Interpretability Strength.
I didn't intend to write this one, but the topic came to mind in reading the postings of those who are, in some way, unsatisfied with the usual set theoretic foundations for mathematics, and strive for a kind of "structuralist" viewpoint - particularly, Pratt and Barwise.
The usual set theoretic foundations is very powerful, coherent, concise, successful, explanatory, impressive, and totally dominating at this time. Taken as a whole, with the major supporting classical developments, it is certainly one of the few greatest acheivments of the human mind of all time.
However, it also does not come close to doing everything one might demand of a foundation for mathematics. At the present time, there is no full blown proposal for scrapping it and replacing it with anything substantially different that isn't far more trouble than its worth. Present cures are far far far worse than any perceived disease.
Now this does not mean that the usual set theoretic foundations might not give way to a better foundations, or might not be altered in some very significant and permanent way. In fact, I can tell you that I work on this from time to time. It just that people should recognize what's involved in doing such an overhaul, and not fool themselves into either
i) embracing something that is either essentially the same as the usual set theoretic foundations of mathematics; or ii) embracing something that doesn't even minimally accomplish what is so successfully accomplished by the usual set theoretic foundations of mathematics.
Now before I remind everybody of some of the most vital features of the usual set theoretic foundations for mathematics, let me state a great, great, great, theorem in the foundations of mathematics:
THEOREM. Sets under membership form the simplest foundationally complete system.
There is one trouble with this result: I don't know how to properly formulate it. In particular, I don't know how to properly formulate "foundational completeness" or "simplest."
Making sense of this "Theorem" and closely related matters are typical major issues in genuine foundations of mathematics. Now before coming back to this, let me summarize the greatest of the usual set theoretic foundations of mathematics.
First of all, set theory is unabashedly materialistic - a perhaps nonstandard word I use to describe the opposite of structuralistic. The viewpoint is that the empty set of set theory has a unique unequivocal meaning independently of context. There is the empty set, and that's that. It doesn't need any context. There is no talk of identifying distinct empty sets because they form the same function.
This materialistic concept of set seems to be very congenial to almost everybody for a while. Thus {emptyset} also has a unique unequivocal meaning independently of context. In fact, one can construct the so called hereditarily finite (HF) sets by the following process:
i) $\varnothing \in HF$; ii) if $x,y \in HF$ then $x \cup \{y\} \in HF$.
This has a clarity and congeniality for most people, without invoking any structuralist ideas.
Now I can already hear the following remark: see, you have used an inductive construction that has not only not been formalized in set theoretic terms yet, but is not even best formulated in set theoretic terms.
Yes, this is true. And yes, there is an idea of inductive construction - at least for the natural numbers - which is not directly faithfully conceived of in purely set theoretic terms. However, look at the costs of scrapping the set theoretic approach in favor of "inductive construction." Can this really be done? I have certainly thought about this, but without success. It is certainly an attractive idea, and we explicitly formulate this:
FOUNDATIONAL ISSUE. Is there an alternative adequate foundation for mathematics that is based on "inductive construction?" In particular, one wants to capture set theory viewed as an inductive construction. If not, one wants to construct a significant portion of set theory as an inductive construction.
Now, instead of scrapping the set theoretic approach in favor of "inductive construction," what about incorporating both? Yes, this can be done in various ways. However, so what? This is only really interesting if one can isolate a small handful of additional ideas that one wishes to directly faithfully incorporate into the prospective foundation for mathematics. Better yet - prove some sort of completeness of this handful.
However, consider the situation in mathematics that was one of the major precipitating factors that made people realize the urgency of foundations. Namely, people were creating all kinds of mathematical concepts - groups, rings, fields, integers, rationals, reals, complexes, division rings, functions of a real and complex variable, series, etcetera. There was no unifying principle as to what is or is not a legitimate construction. Mathematicians do not want to go down that road again, and are comforted by the fact that this matter has been resolved by set theory - even if it does not provide for a directly faithful formalization of the way they actually visualize and think. In summary, there is a danger of the cure being far far far worse than the disease.
Now, coming back to set theory and HF. Obviously, it is congenial and natural to most people to form the set HF. And then there is the natural idea of subset of HF. Then for each natural number n, one can form the $n$-th power set of HF; let's write this as $V_{\omega + n}$.
Let us give the name $V_{\omega + \omega}$ for the universe of all sets that are members of some $V_{\omega + n}$. There are a number of beautiful axioms one immediately writes down about this universe. A small number of them allow for the derivation of lots of others. This is a very coherent and workable system of objects, under epsilon, for a foundation of a very very large portion of mathematical practice. Now I have been very concerned with the following for nearly 30 years:
FOUNDATIONAL ISSUE. What interesting mathematics is missing if one uses $V_{\omega + \omega}$ (with the obvious associated axioms)? Obviously, one does not mean simply that $V_{\omega + \omega}$ itself is missing, since $V_{\omega + \omega}$ is meant to provide ontological overkill. Instead, one means that what mathematical information of an ordinary mathematical character cannot be derived in such a foundation?
Ex: Let E be a subset of the unit square in the plane, which is symmetric about the line $y = x$. Then E contains or is disjoint from the graph of a Borel measurable function.
This cannot be proved in such a Foundation, but can be proved in a somewhat more encompassing foundation. This result is a typical achievement in foundations of mathematics.

Ten Axioms That Shook the World

(No prizes for guessing the political joke in the title!)

The central claim made in relation with HoTT is its comparison with ZFC set theory as an alternative foundation for mathematics. I'm interested in this comparison. A few days ago, I posted a quick suggestion of five adequacy criteria for comparing foundational systems for mathematics. These are:
  • Austerity
  • Non-Circularity
  • Justification
  • Interpretation Strength
  • Structural Invariance
The axiom system ZFC scores highly on these, except the final one, Structural Invariance. How HoTT does is not clear to me at the moment, but I'm really interested to learn more. (There have been some useful comments on this by François Dorais below the post.)

But I'm not so interested in the whole "Set Theory vs. Category Theory" shouting match! (Cf.,  Einstein's comment on the row between Hilbert and Brouwer, calling it the Frog-Mouse battle.) I'd like to get an idea about how these foundational approaches compare, and fit with, say,
  • mathematical practice
  • philosophical questions (i.e., ontology & epistemology)
  • mathematics education
  • application of mathematics in science.
And it would be nice to invoke, if possible, mathematical notions for the comparison. For example, consistency/interpretability strength.

Let me explain why $ZFC$ does so well. A simplified (but hopefully not misleading) formulation of the "Ten Axioms That Shook the World" of the title:
    Zermelo-Fraenkel Set Theory (with Choice, and Foundation)
  • If $\forall z(z \in x \leftrightarrow z \in y)$, then $x = y$.
  • $\varnothing$ is a set.
  • If $x, y$ are sets, then $\{x,y\}$ is a set.
  • If $x$ is a set, then $\bigcup x$ is a set.
  • If $x$ is a set, then $\mathcal{P}(x)$ is a set.
  • If $x$ is a set, then $A \cap x$ is a set.
  • There is an inductive set.
  • If $x$ is a set of non-empty sets, then there is a choice function $f$ on $x$.
  • If $x$ is a non-empty set, then there is an element $y \in x$ disjoint from $x$.
  • If $x$ is a set and $F : x \to y$, then $y$ is a set.
(As aficionados know, this is a kind of "second-order" formulation, resembling $NBG$, which is conservative over $ZFC$. But it makes the content more intuitive.)

This foundational theory contains 10 axioms (well, strictly speaking, 8, with 2 axiom schemes) and is conceptually Austere. It has $\in$ and $=$ as its basic concepts, and is formulated in classical first-order logic. There is no "typing" at all. There is a single variable sort, and statements of the form:
$x \in y$
$x = y$
make sense. Consequently, there is no "Julius Caesar Problem", the issue that Frege once raised about abstraction principles. For example,
$\{\varnothing\} \neq V_{\omega}$. 
etc.

This foundational theory is conceptually Non-Circular. We don't cheat by putting numbers, integers, rational, real numbers, cardinals, ordinals, etc., in by hand. We define these structures and prove their existence (well, the existence of an "implementation", typically non-unique: I come back to this below).

This foundational theory has an intuitive epistemic Justification. One can picture, perhaps even "grasp" the cumulative hierarchy, as growing transfinite tower of "ranks", starting with $\varnothing$ and iterating the application of $\mathcal{P}$. Of course, this might not convince you that there is such an intended universe of sets.

This foundational theory has very high Interpretability Strength. It interprets pretty much everything any mathematician has ever written down. And for stuff it doesn't, it can, by either going to impredicative second-order theory, or by adding large cardinals.

However, $ZFC$'s weakness concerns its "implementation" of all the various mathematical objects, thingies, patterns, structures, and beasties that mathematicians know and love. The pairs, relations, functions, numbers, groups, spaces, and so on. One can interpret, or model, pairs, relations, functions, numbers, etc., in $ZFC$. But the modelling is always non-unique. There are lots of extensionally different implementations, rather like gauge choices or co-ordinate systems in physics. And it offends the mathematician's Platonic sensibilities to have lots of different "implementations" of the natural numbers, when one would really like to identify one single system: the natural numbers.

In short, $ZFC$ violates Structural Invariance. People have begun to say (the term derives from remarks by Harvey Friedman) that $ZFC$ is a "material set theory", which is contrasted with "Structural Set Theory", which is what we have in the category version of set theory, Lawvere's Elementary Theory of the Category of Sets (ETCS: relatedly, see this post "Rethinking Set Theory", by Tom Leinster), and also, I assume, in HoTT too.

To be continued!

Sunday, 23 June 2013

Strange Beliefs about Abstract Objects

Occasionally, the strange belief that mathematical objects are physical objects is advocated. I am truly baffled when I hear such beliefs. Here are the some questions:
Is the number $0$ a physical object?
Is the number $2^{2^{2^{2^{2^{2^{2^{2}}}}}}}$ a physical object?
Is the wellorder $(\omega, <)$ a physical object?
Is the topological space $\mathbb{R}^4$ a physical object?
Is the Lie group $SU(3)$ a physical object?
Is the rank $V_{\omega + 57}$ a physical object?
For example, what is the mass of $(\omega, <)$? Can you find it somewhere, perhaps at Tesco's?

Much Badiou About Nothing

Alain Badiou is a French intellectual, long-time Maoist, and author of a 1988 book L'Être et l'Événement (translated as Being and Event). He is a Professor at the European Graduate School where his webpage biography says:
Trained as a mathematician, Alain Badiou is one of the most original French philosophers today. Influenced by Plato, Georg Wilhelm Friedrich Hegel, Jacques Lacan and Gilles Deleuze, he is an outspoken critic of both the analytic as well as the postmodern schools of thoughts. His philosophy seeks to expose and make sense of the potential of radical innovation (revolution, invention, transfiguration) in every situation.
I read Being and Event a couple of years ago, as it can be found as a pdf, and I had recalled mention of Badiou by Alan Sokal and Jean Bicmont in their 1998 book Intellectual Impostures.

Anyway, I'd forgotten him until recently, but had an occasion to watch a Youtube video, "Infinity and Set Theory: How to Begin with the Void" of Badiou giving a 2011 talk concerning set theory and "the void" (a transcript is here).

Here is a snapshot I took of the talk (at 53:05). What's wrong with it? Don't cheat! (Hint below)




--------------------
Hint:
$\varnothing$
It doesn't even seem to be a typo or a spelling mistake, as he keeps repeating it.

(I make plenty of spelling mistakes - a friend on facebook keeps correcting my rubbish English ...)

Saturday, 22 June 2013

Adequacy Conditions on a Foundation

I'm interested in the recent claims about Homotopy Type Theory, HoTT, being a foundation of mathematics. For example, Frege, Cantor, Russell and Zermelo did provide foundations for (parts of) mathematics. But what did they do? What is a foundation?

First, here's a proposed list of five adequacy conditions for a claimed foundation $F$ for mathematics:
(Austerity) $F$ is conceptually austere.
(Non-Circularity) $F$ is conceptually non-circular.
(Justification) $F$ has an intuitive epistemic justification.
(Interpretability Strength) $F$ has high interpretability strength.
(Structural Invariance) $F$ should characterize certain structures only "up to isomorphism".
This is a rough proposal, and these may not be necessary and/or jointly sufficient. And there are matters of degree here too.

Usually, mathematicians have taken $ZFC$ (or something similar, perhaps weaker, or perhaps stronger, like $MK$) as their favourite foundation and teach it to first-year maths students throughout the world (Boolean operations, pairs, relations, functions, the natural numbers, ordinals, sequences, etc., etc). Here are the reasons for this:
  • $ZFC$ is conceptually austere: its sole primitive concepts are $\in$ and $=$.
  • $ZFC$ is conceptually non-circular: one does not assume the axioms of arithmetic, analysis, etc. One proves the axioms of arithmetic, analysis, etc.
  • $ZFC$ has some kind of intuitive epistemic justification: one can have an intuitive picture of the cumulative hierarchy $V = \bigcup_{\alpha \in ON} V_{\alpha}$.
  • $ZFC$ has high interpretability strength: $ZFC$ interprets pretty much everything. If you want more, move to $MK$ or add large cardinals.
In short, $ZFC$ is conceptually and epistemically "unified".

However, $ZFC$ violates the Structural Invariance criterion: One can certainly reduce, e.g., arithmetic, analysis, etc., to $ZFC$ successfully (to $Z$ in fact). But these reductions are "implementation-dependent". If one is inclined towards structuralism, then this implementation-dependence is an unhappy state of affairs. (However, having said that, I think that one can remedy this defect by adding abstraction principles for particular kinds of mathematical object, as sui generis entities.)

Does HoTT satisfy these criteria? First, it aims to satisfy Structural Invariance: this is the point of the Univalence Axiom. And it seems to me that it also satisfies Interpretability Strength, as the new book shows in detail. But it also seem to me that it fails the criteria of Austerity, Non-Circularity and Justification.

Leibniz Abstraction, Structure Identity and Univalence

In mathematics, one gets used to thinking of a particular mathematical object - say a group, or a graph, or a partial order, or a field, or a topological space in abstract terms. That is, given the model $(X, R_1, \dots)$ one "forgets" the specific nature of the carrier set $X$, and one focuses only on properties of the mathematical model/structure, which are invariant under isomorphism. This means that one is, in a sense, thinking of some "entity" which all isomorphic copies "have in common".

So, for example, if our models $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ are isomorphic graphs, then one imagines that there is some other thingy -- call it $\hat{G_1}$ -- such that $\hat{G}_1 = \hat{G}_2$. Obviously, the carrier sets $V_1$ and $V_2$ can be distinct. But this is "abstracted away". If our models are isomorphic partial orders $\mathbb{P}_1 = (X_1, \preceq_1)$ and $\mathbb{P}_2 = (X_2, \preceq_2)$, then we imagine that there are corresponding "abstract structures" $\hat{\mathbb{P}}_1$ and $\hat{\mathbb{P}}_2$ such that $\hat{\mathbb{P}}_1 = \hat{\mathbb{P}}_2$.

More generally, we imagine we have some map,
$\mathcal{A} \mapsto \hat{\mathcal{A}}$
taking us from graphs, orderings, groups, linear spaces, topological spaces, etc., etc., to abstract graphs, orderings, groups, linear spaces, etc., etc. such that
$\hat{\mathcal{A}}_1 = \hat{\mathcal{A}}_2$ iff $\mathcal{A}_1 \cong \mathcal{A}_2$.
holds. I call this principle Leibniz Abstraction. But, unfortunately, no one knows that these abstract "thingies" $\hat{\mathcal{A}}$ are ... But whatever they are, they are not models, with a carrier set. Their carrier set has been abstracted away, somehow.

Vladimir Voevodsky's Univalence Axiom, as I understand, is an attempt to make this very intuitive idea mathematically precise. There are many formulations, including in the book on Homotopy Type Theory, of course. Here is a formulation from Ulrik Buchholtz, from a 2013 talk on "Univalent Foundations and the Structure Identity Principle":
Univalence Axiom (Voevodsky)
For types $A$ and $B$, the identity type $Id_U(A, B)$ is equivalent to the type $Eq(A,B)$ of (homotopy) equivalences between $A$ and $B$.
(I think that the "Structure Identity Principle" amounts to what I call "Leibniz Abstraction" above; certainly, this is what has been discussed a lot by philosophers of mathematics for decades. Particularly structuralist philosophers of mathematics, like Shapiro, Hellman and Resnik.)

Now I'm fairly sure that this overlaps very closely with the following: one can define "thingies" $\hat{\Phi}_{\mathcal{A}}$ such that Leibniz Abstraction holds, for set-sized mathematical models $\mathcal{A},\mathcal{B}$ as follows:
Leibniz Abstraction
$\hat{\Phi}_{\mathcal{A}} = \hat{\Phi}_{\mathcal{B}}$ iff $\mathcal{A} \cong \mathcal{B}$.
These entities are propositional diagrams.

The rough idea is this: let $\mathcal{A} = (A, R_1, \dots)$ be some mathematical object, with a set-sized carrier set/domain $A$ (and special or distinguished relations $R_i$). $\mathcal{A}$ can be a graph, a group, etc.

One can define a purely logical formula $\Phi_{\mathcal{A}}(\vec{X})$, in a language $\mathcal{L}(\mathcal{A})$, possibly infinitary, with free (second-order) variables amongst $\vec{X}$, such that categoricity holds:
$\mathcal{B} \models \Phi_{\mathcal{A}}(\vec{X})$ iff $\mathcal{B} \cong \mathcal{A}$. 
Thus the formula $\Phi_{\mathcal{A}}(\vec{X})$ defines the isomorphism type (groupoid) of $\mathcal{A}$. The formula $\Phi_{\mathcal{A}}(\vec{X})$ is called the diagram formula for $\mathcal{A}$. It is called this by analogy with the notion of a diagram in model theory. The diagram of a model $\mathcal{A}$ is a kind of "picture" of all the basic relationships between elements of the model $\mathcal{A}$, obtained by introducing a constant $c_a$ to name each $a \in A$. But the diagram formula $\Phi_{\mathcal{A}}(\vec{X})$ itself does not contain any constants labelling domain elements, Rather, these have been existentially quantified.

Then we let $\hat{\Phi}_{\mathcal{A}}$ be the propositional function expressed by the diagram formua $\Phi_{\mathcal{A}}(\vec{X})$. Then Leibniz Abstraction follows. So far as I can tell, this also implements the intuitive idea that Univalence is intended to implement too: namely, the Structure Identity Principle.

Friday, 21 June 2013

Questions about Homotopy Type Theory

Catarina today posted a link to the newly published very interesting work (joint work) on Homotopy Type Theory, largely initiated by my MCMP colleague Steve Awodey and Vladimir Voevodsky a short while ago, and developed with a group of other workers, during a year-long seminar at Princeton. So, let $\mathsf{HoTT}$ be Homotopy Type Theory.

The philosophical claim, which seems a pretty strong one, is that $\mathsf{HoTT}$  is some radically new "foundation for mathematics". Because I've been wondering about this for about a year in relation to $\mathsf{HoTT}$, and have just been teaching Frege's foundations of arithmetic at Oxford, I've got two quick questions.

First, is the following true?
$\mathsf{HoTT} \vdash 0 \neq 1$
Second, it's not clear to me what the proof is.

UPDATE (22 June): Ulrik Buchholtz in the comments has kindly answered these for me, both positively. $\mathsf{HoTT}$ extends Martin-Löf's type theory with the Univalence axiom. Because Martin-Löf's type theory contains its own theory of arithmetic (essentially equivalent to Peano arithmetic) it also proves $0 \neq 1$, and the proof carries over.

$\mathsf{HoTT}$ still seems to me a bit unsatisfying as a foundation, both conceptually and epistemologically, because (at least if I understand it right) axioms/rules for arithmetic are assumed, rather than proved -- which is what Frege does in Grundlagen, using second-order comprehension and Hume's Principle.]

Fregean Proof that 0 is not 1

This term there has been a course at Oxford, for first year maths & philosophy students, called "Frege's Foundations of Arithmetic", taught by my colleague Jeff Russell.

A fairly detailed exposition of Frege's technical work is given here, "Frege's Logic, Theorem and Foundations of Arithmetic", by Ed Zalta. Much of the reconstruction of Frege's work on arithmetic only began in the 1980s. It appeared in work by Crispin Wright, George Boolos and Richard G. Heck (in particular, see Richard's recent book, "Frege's Theorem").

Over the term, I've occasionally written down some notes for my own students and related to that here is a modernized version of Frege's proof that $0 \neq 1$. Suppose we assume there are collections and relations, and we make this more exact using second-order comprehension axioms:
$\mathsf{C}_1: \exists A \forall x(Ax \leftrightarrow \phi(x))$.
$\mathsf{C}_2: \exists R \forall x,y(Rxy \leftrightarrow \phi(x,y))$
$\dots$
$\mathsf{C}_n: \exists R \forall x_1, \dots x_n(Rx_1\dots x_n \leftrightarrow \phi(x_1,\dots, x_n))$.
$\dots$
Suppose we assume that, for each collection $A$, there is a cardinal $c(A)$. Suppose we assume
$\mathsf{HP}: c(A) = c(B)$ iff $A \sim B$.
where "$A \sim B$" means that there is a bijection between $A$ and $B$ (i.e., they are equinumerous).

The formal system containing these axioms is called Frege Arithmetic ($\mathsf{FA}$).
Now we define:
$\varnothing x := (x \neq x)$.
$0 := c(\varnothing)$
$\{\varnothing\}x := (x = 0)$.
$1 : = c(\{\varnothing\})$.
Now we have:
$\mathsf{FA} \vdash 0 \neq1$
Proof: Reasoning in $\mathsf{FA}$, suppose $0 = 1$. So, by $(\mathsf{HP})$, $\varnothing \sim \{\varnothing\}$. So, there is a bijection $f: \varnothing \to \{\varnothing\}$. This leads to a contradiction. Hence, $0 \neq 1$.

Homotopy Type Theory: the book is out!

The long-awaited book on Homotopy Type Theory, a creation of a group of mathematicians working under the collective name of Univalent Foundations Program, is now out! (See here for a snapshot at the content.) And there are many reasons to rejoice:
  • I’ve been following with much interest the development of the program, which presents itself as an alternative to set theory for the foundations of mathematics. I don’t expect to understand more than 5% of the mathematical content of the book, but I hope to at least be able to understand the core ideas and have a feel for how they are put in practice.
  • It is a fantastic illustration of the power of collective, distributed work in mathematics, which is by and large a discipline still based (at least in theory, if not in practice) on the ‘lone thinker’ model. The book was written in only six months, which would be unthinkable for a single author or even for a small number of co-authors (according to A. Bauer, two dozen mathematicians were involved in the writing process).
  • It is entirely and completely open access. What’s more, through the github platform used also in the writing process (which I am myself not familiar with), readers can actively contribute to the content of the book, propose modifications, improvements etc. (Again, my source is Bauer's post.) 

So this new book may rock not only the foundations of mathematics as such; it may also represent a revolution in how mathematics is practiced, a move towards a more collaborative model.

UPDATE: Eric Schliesser has a post at NewAPPS on the book, where he quotes this wonderful passage by Steve Awodey
But for all that, what is perhaps most remarkable about the book is what is not in it: formalized mathematics. One of the novel things about HoTT is that it is not just formal logical foundations of mathematics in principle: it really is code that runs in a computer proof assistant. This book is an exercise in “unformalization” of results that were, for the most part, first developed in a formalized setting. (Many of the files are available on GitHub and through the Code section of this site.) At the risk of sounding a bit grandiose, this represents something of a “new paradigm” for mathematics: fully formal proofs that can be run on the computer to ensure their absolute rigor, accompanied by informal exposition that can focus more on the intuition and examples. Our goal in this Book has been to provide a first, extended example of this new style of doing mathematics, by developing the latter, informal aspect to the point where — hopefully —others can see how it works and join us in pushing it forward. 
The philosophical implications of this new approach are (potentially) phenomenal; I hope to be able to say more on the interplay between formalization/unformalization and intuitions in the near future, once I will have had the time to check the book more closely.

Thursday, 20 June 2013

Williams on Accuracy, Reflection, and Conditionalization

In the previous post, I presented a couple of arguments for the Principal Principle.  The first was based on the following fact:  For many plausible measures of distance between credence functions, if a credence function $c$ violates the Principal Principle, then there is a credence function $c'$ that satisfies it such that $c'$ is closer to each of the credence functions that match the possible objective chances than $c$ is.  That is, if $ch_w$ is the chance function at world $w$, then for all worlds $w$, $c'$ is closer to $ch_w$ than $c$ is.  Thus, if you think, as Alan Hájek does, that credences aim to match the objective chances, then it seems that you should obey the Principal Principle.  The second argument was based on the following fact:  For many plausible measures of distance between credence functions, if a credence function $c$ violates the Principal Principle, then there is a credence function $c'$ that satisfies it such that each possible objective chance function expects $c'$ to be more accurate than it expects $c$ to be (where accuracy is proximity to the omniscient credence function).  That is, for all worlds $w$, $ch_w$ expects $c'$ to be closer to the omniscient credences than it expects $c$ to be.  Thus, if you think that the objective chances should guide decisions when they speak univocally in favour of one action over another, then it seems that you should obey the Principal Principle.  Thus, suppose you know that the chance of a coin landing heads is between 0.6 and 0.8 inclusive.  And suppose that your credence in it landing heads is 0.5.  Then, according to the first of these arguments, you are irrational because there is an another credence (for instance, 0.6) that is closer to matching the chances than your credence is regardless of what the chances are.  And according to the second of these arguments, you are irrational because there is another credence (for instance, 0.6) that each possible chance function expects to be more accurate than it expects your credence to be.

In this post, we will consider two arguments for van Fraassen's Generalized Reflection Principle and Conditionalization (which follows from Generalized Reflection) (van Fraassen, 1999).  They are related to the two arguments for the Principal Principle considered above.  The first is a beautiful new argument due to Robbie Williams (Williams, ms).  The second is my take on Robbie's argument.  Throughout, I'll assume that all credence functions are probability functions.

Williams' argument for GRP and Conditionalization


We consider an agent with a credence function $c$ and an updating rule $\mathbf{R}$.  $\mathbf{R}$ takes a partition $\mathcal{E}$ and an element of that partition $E$ and returns a credence function $c_{\mathbf{R}(\mathcal{E}, E)}$.  We think of $c_{\mathbf{R}(\mathcal{E}, E)}$ as the credence function that the updating rule would mandate were the agent were to receive evidence $E$ from partition $\mathcal{E}$.  And we demand that $c_{\mathbf{R}(\mathcal{E}, E)}(E) = 1$.  That is, updating in the light of evidence $E$ ought to make an agent certain of $E$.

For instance, if I am about to perform an experiment, I will typically know the partition from which my evidence will come:  perhaps I know that my measuring instrument will read 1, 2, or 3.  Then I know that my evidence will come from the partition $\mathcal{E} = \{1, 2, 3\}$.  An updating rule takes a partition and an element of the partition and tells you what your new credence function should be if you learn that element of the partition.

Williams asks us to consider the following situation.  Suppose $D$ is a measure of distance between credence functions:  for the purpose of his argument, $D$ could be Squared Euclidean Distance, or cross-entropy, or any other Bregman divergence.  And suppose that $\mathcal{E}$ is a partition.  Now suppose that there is some credence function $c'$ that is closer to each $c_{\mathbf{R}(\mathcal{E}, E)}$ (for $E$ in $\mathcal{E}$) than the agent's credence function $c$ is.  This, Williams claims, would make the agent irrational.  That is, for Williams, it is irrational to have a credence function and an updating rule such that, for some partition, the credence function is further than it needs to be from the various posterior credence functions that the updating rule would recommend in the light of the various elements of the partition.  In symbols:

Future Credence Dominance  Suppose I have credence function $c$ and I endorse updating rule $\mathbf{R}$.  Suppose $\mathcal{E}$ is a partition.  And suppose there is $c'$ such that
\[
D(c_{\mathbf{R}(\mathcal{E}, E)}, c') < D(c_{\mathrm{R}(\mathcal{E}, E)}, c)
\]
for all $E$ in $\mathcal{E}$.  Then I am irrational.

Thus, suppose I am about to perform an experiment, and I thereby know that I will learn an element of the partition $\{1, 2, 3\}$.  Suppose further that my updating rule tells me to adopt a credence of 0.1 if I learn 1, 0.2 if I learn 2, and 0.3 if I learn 3.  But suppose that I currently have credence 0.5.  Then, according to Future Credence Dominance, I am irrational because there is another credence (for instance, 0.3) that is closer to each of my possible future credences than my current credence is.

What epistemic norm follows from Future Credence Dominance along with the claim that $D$ must be a Bregman divergence?  The answer is:  van Fraassen's Generalized Reflection Principle.

Generalized Reflection  Suppose I have credence function $c$ and I endorse updating rule $\mathbf{R}$.  Then I am irrational unless
\[
c(X) = \sum_{E \in \mathcal{E}} c(E) c_{\mathbf{R}(\mathcal{E}, E)}(X)
\]

That is, my current credence in a proposition ought to be my expected future credence in it.  Notice that this is a norm that applies to credence function-updating rule pairs.  That this follows from Future Credence Dominance is a consequence of the following two Lemmas:

Lemma 1  Suppose $\mathbf{R}$ is an updating rule.  Let $\mathbf{R}(\mathcal{E}) = \{c_{\mathbf{R}(\mathcal{E}, E)} : E \in \mathcal{E}\}$.  Then
  1. If $c \not \in \mathbf{R}(\mathcal{E})^+$, then there is $c' \in \mathbf{R}(\mathcal{E})^+$ such that, for all $E$ in $\mathcal{E}$,\[D(c_{\mathbf{R}(\mathcal{E}, E)}, c') < D(c_{\mathrm{R}(\mathcal{E}, E)}, c)\]
  2. If $c \in \mathbf{R}(\mathcal{E})^+$, then there is no $c'$ such that, for all $E$ in $\mathcal{E}$, \[ D(c_{\mathbf{R}(\mathcal{E}, E)}, c') \leq D(c_{\mathrm{R}(\mathcal{E}, E)}, c) \]
Proof. This is a special case of the theorem to which we appealed in the accuracy-based argument for Probabilism and the accuracy-based argument for the Principal Principle.  Suppose $\mathcal{X}$ is a set of credence functions; then, for any credence function $c$ that lies outside the convex hull $\mathcal{X}^+$ of $\mathcal{X}$, there is a credence function $c'$ that lies inside $\mathcal{X}^+$ such that $c'$ is closer to each member of $\mathcal{X}$ than $c$ is.
$\Box$

Lemma 2  (van Fraassen, 1999) $c \in  \mathbf{R}(\mathcal{E})^+$ iff $c$ satisfies Generalized Reflection.

Proof.  From right to left is straightforward.  Thus, suppose $c \in \mathbf{R}(\mathcal{E})^+$.  That is, for each $E$ in $\mathcal{E}$, there is $\lambda_E \geq 0$ such that $\sum_{E \in \mathcal{E}} \lambda_E = 1$ and
\[
c(X) = \sum_{E \in \mathcal{E}} \lambda_E c_{\mathbf{R}(\mathcal{E}, E)}(X)
\]
for all $X$.  Thus, in particular, if $E'$ is in $\mathcal{E}$, then
\[
c(E') = \sum_{E \in \mathcal{E}} \lambda_E c_{\mathbf{R}(\mathcal{E}, E)}(E')
\]
But, by stipulation,
\[
 c_{\mathbf{R}(\mathcal{E}, E)}(E') = \left \{ \begin{array}{ll}
1 & \mbox{ if } E' = E \\
0 & \mbox{ if } E' \neq E
\end{array}
\right.
\]
So $\lambda_{E'} = c(E')$, as required.
$\Box$

Thus, we have the following argument for Generalized Reflection:
  1. The distance between credence functions ought to be measured by a Bregman divergence $D$.
  2. Future Credence Dominance
  3. Lemmas 1 and 2
  4. Therefore, Generalized Reflection
And this is simultaneously an argument for Conditionalization.  After all, as van Fraassen pointed out, Generalized Reflection is equivalent to Conditionalization.  Recall, Conditionalization says:

Conditionalization  Suppose I have credence function $c$ and I endorse updating rule $\mathbf{R}$.  Then I am irrational unless
\[
c_{\mathbf{R}(\mathcal{E}, E)}(X) = c(X | E)
\]

Theorem 1 (van Fraassen, 1999)  Generalized Reflection iff Conditionalization

Proof. Suppose $c$, $\mathbf{R}$ satisfy Generalized Reflection. Then
\begin{eqnarray*}
c(X | E') & = & \frac{c(XE')}{c(E')} \\
& = & \frac{\sum_{E \in \mathcal{E}} c(E)c_{\mathbf{R}(\mathcal{E}, E)}(XE')}{\sum_{E \in \mathcal{E}} c(E)c_{\mathbf{R}(\mathcal{E}, E)}(E')} \\
& = & \frac{c(E')c_{\mathbf{R}(\mathcal{E}, E')}(XE')}{c(E')c_{\mathbf{R}(\mathcal{E}, E')}(E')} \\
& = & c_{\mathbf{R}(\mathcal{E}, E')}(X)
\end{eqnarray*}
Now suppose $c$, $\mathbf{R}$ satisfy Conditionalization.  Then
\begin{eqnarray*}
c(X) & = & \sum_{E \in \mathcal{E}} c(E) c(X|E) \\
& = & \sum_{E \in \mathcal{E}} c(E) c_{\mathbf{R}(\mathcal{E}, E)}(X)
\end{eqnarray*}
$\Box$

Thus, Future Credence Dominance gives us not only Generalized Reflection, but also Conditionalization, since the two norms are equivalent.

But one might wonder how compelling Future Credence Dominance is.  Why should we care about getting close to our future credences?  Here's a suggestion together with some worries about it.

One might care about proximity to one's future credences because one cares about proximity to the omniscient credences and one believes that those future credences will be close to the omniscient credences.  That is, one wishes to be accurate, and one believes that one's future credences are accurate.  There seem to be two ways of making precise the accuracy that we attribute to our future credences:
  1. On the first, we say that my future credences are guaranteed to be closer to the omniscient credences than my current credences are.  That is, we assume that, for each partition $\mathcal{E}$ and each $E \in \mathcal{E}$, \[ D(v_w, c_{\mathbf{R}(\mathcal{E}, E)}) < D(v_w, c)\]for all $w$ in $E$.   For any $D$, there are updating rules that have this property.  Now, we might try to justify Future Credence Dominance as follows:  Suppose $c'$ is closer to each $c_{\mathrm{R}(\mathcal{E}, E)}$ than $c$ is; then, since each $c_{\mathbf{R}(\mathcal{E}, E)}$ is closer to $v_w$ than $c$ is, for $w$ in $E$, it will be the case that $c$ is closer to $v_w$ than $c'$ is, for each world $w$.  But unfortunately, this argument isn't valid.  The conclusion doesn't follow.
  2. On the second, we say that my future credences are expected to be closer to the omniscient credences than my current credences are.  Expected by whom?  By me.  Again, for any $D$, there are updating rules that have this property.  In fact, by Greaves and Wallace's result from a couple of posts ago, updating by conditionalization always has this property.  But now we seem to be very close to the Greaves and Wallace argument.  If we value proximity to our future credences because we expect them to be more accurate than we expect our current credences to be, surely we'll value the updating rule most that we expect to give the most accurate future credences.  As Greaves and Wallace's argument shows, that rule is always conditionalization.  So we have no need of a further argument.
That's my concern about Future Credence Dominance.  In the next section, I consider a different argument for Conditionalization that goes through Generalized Reflection.
 

Another argument for GRP and Conditionalization

 
Recall the arguments for the Principal Principle considered in the previous post:  on the first, we showed that, if an agent values proximity to the objective chances, then she should satisfy the Principal Principle; on the second, we showed that, if she values proximity to the omniscient credences, but takes the objective chances to guide her actions whenever they speak univocally, she should satisfy the Principal Principle.  The argument for Generalized Reflection, and therefore for Conditionalization, had a similar structure to the first argument for the Principal Principle:  if an agent values proximity to her future credences, she ought to satisfy Generalized Reflection (and therefore Conditionalization).  In this section, I'll give an argument that has a similar structure to the second argument for the Principal Principle:  I'll point out that, if an agent values proximity to the omniscient credences (that is, she values accuracy), but takes her future credences to guide her actions whenever they speak univocally, she should satisfy Generalized Reflection (and therefore Conditionalization).
 
Here's the norm we'll use in place of Future Credence Dominance:
 
Future Credence Expected Dominance   Suppose I have credence function $c$ and I endorse updating rule $\mathbf{R}$.  Suppose $\mathcal{E}$ is a partition.  And suppose there is $c'$ such that each $c_{\mathbf{R}(\mathcal{E}, E)}$ expects $c'$ to be more accurate than it expects $c$ to be:  that is,
\[
\sum_w c_{\mathbf{R}(\mathcal{E}, E)}(w) D(v_w, c') < \sum_w c_{\mathbf{R}(\mathcal{E}, E)}(w) D(v_w, c)
\]
for all $E$ in $\mathcal{E}$.  Then I am irrational.

Now, in the previous post, I mentioned that we have the following result:

Lemma 3  If $\mathcal{X}$ is a set of probability functions and $c$ lies outside $\mathcal{X}^+$, then there is $c'$ that lies inside $\mathcal{X}^+$ such that every probability function in $\mathcal{X}$ expects $c'$ to be more accurate than it expects $c$ to be.

This, together with Future Credence Expected Dominance gives us that one's credence function ought to lie in the convex hull of one's possible future credences.  And, as we saw above (Lemma 2), if $c$ lies in the convex hull of the possible future credences, then $c$ satisfies Generalized Reflection and the possible future credences must be obtained by Conditionalization.  Thus, we have the following argument:
  1. The distance between credence functions ought to be measured by a Bregman divergence $D$.
  2. Future Credence Expected Dominance
  3. Lemmas 2 and 3
  4. Therefore, Generalized Reflection (and therefore, Conditionalization)
Is Future Credence Expected Dominance plausible?  I think so.  Of course, one's future credences will rarely speak univocally in favour of one option over another.  But, when they do, one should take their advice.
 
How does this argument compare to the Greaves and Wallace argument?  Greaves and Wallace argue for Conditionalization by pointing out that it is the updating rule that looks best from the point of view of our current credence function; the present argument proceeds by pointing out that if we plan to update other than by Conditionalization, then our current credence function doesn't look optimal from the point of view of each of the future credence functions mandated by the updating rule.  Thus, the present argument (and also Williams' argument) avoid a common objection to the Greaves and Wallace argument:  the objection says that we shouldn't judge an updating rule by the lights of a credence function that the updating rule will lead us to replace.  Instead, we are using the future credence functions with which the updating rule will replace our current credence function to judge our current credence function.  Jason Konek has been looking at other ways in which we might justify updating rules by considering the point of view of the future credence functions to which the updating rule will give rise.  I'll post on that later.

References