## Wednesday, 22 May 2013

This is a quick comment on the issue about Mochizuki's claimed proof of the abc conjecture that Catarina wrote about a couple of days ago. (I don't know much about this number theory stuff.)

Are proofs cognitive entities? Is every proof cognized? Known? Knowable?
Every proof P of a mathematical claim is cognizable by some one (or more) agent.
This claim is analogous to certain verificationist claims more generally (e.g., every truth is knowable). I believe that this claim is mistaken, or, at least, not justified. For all I know, Mochizuki has found a proof, but unfortunately, it is simply not cognized yet, by anyone else. This is a bit annoying, of course. So far as I, or anyone else can tell, he has not done anything mathematically wrong. If something is "wrong" here, it belongs to social epistemology.

Maybe related to this is the following result, a consequence of Church's Theorem (on the undecidability of predicate logic), which is an indication of how complicated proofs can be even of theorems of predicate logic:
Let $L$ be a first-order language with a binary predicate $R$. Let $|\phi|$ be the number of symbols in $\phi$. There is no recursive function $f : \mathbb{N} \to \mathbb{N}$ such that, for all $\phi \in L$, where $|\phi| = n$, then if $P$ is a proof of $\phi$ in predicate logic, then $\phi$ has a proof $P^{\ast}$ such that $|P^{\ast}| \leq f(n)$.
[Proof: Suppose that there is such a function $f$. Let $M_f$ be a TM that computes $f$. Now suppose we are given a formula $\phi \in L$. We have the query:
$\vdash \phi$?
Compute $n = |\phi|$ and compute $f(n)$ using $M_f$. Predicate logic proofs can be recursively enumerated in increasing size. Run through the predicate logic proofs in such an enumeration. If a proof $P$ of $\phi$ is reached with $|P| \leq f(n)$, then we have that $\vdash \phi$. If a proof of $\phi$ is NOT reached at this point, then we can conclude, by the defining property of $f$, that $\nvdash \phi$. This is a decision procedure for logical validity in $L$, contradicting Church's Theorem.]

This means that proofs of valid theorems of logic can get larger and larger with no recursively specifiable bound. Consequently, there is no reason whatsoever to suppose that every proof can be cognized, or recognized, by some finite agent.

George Boolos (1987, "A Curious Inference") has given a very nice example of a first-order theorem $\phi_{Bool}$ of logic whose shortest proof in first-order predicate logic is astronomically vast. The underlying idea is that the formula "encodes" the Ackermann function $A$, and a predicate logic proof requires a step-by-step computation of length about equal to the value of this function ($A(4,4)$, if I recall right). He notes, however, that $\phi_{Bool}$ has a short proof in second-order predicate logic. (This is an example of "speed-up".) I wrote a short paper about this issue in Analysis several years ago (2005, "Some More Curious Inferences").

Cognitive Reductionism about languages is the following (empirical) claim:
Every language L is spoken/cognized by some one or more speakers.
That is, the claim that languages can be reduced to cognitve states of some one or more speakers. However, I think that cognitive reductionism is deeply mistaken. There are languages which are not spoken, or cognized.

So, on my view, statements of the form:
Agent A cognizes language L
Agents A and B cognize the same ("shared") language L.
are contingent empirical claims. The agent A might not have cognized L. Whether agents A and B cognize a "shared" language is an empirical question.

It seems clear that, as a matter of empirical observation, agents never cognize the same language (though this is contingent, of course). There are lexical, phonological, semantic, pragmatic, etc., differences. And this phenomenon---heterogeneity in speech communities---requires explanation.

## Tuesday, 21 May 2013

### Cognizing a Language

I see metasemantics has having two major components (cf, David Lewis 1970, "General Semantics"). One component studies languages, what their properties are, how they're individuated, etc. The other component studies how languages are "cognized".

On the first issue, for the metasemantics I prefer, languages are finely-individuated mixed mathematicalia, whose intrinsic syntactic, phonological, semantic, pragmatic, orthographic properties are essential. The corresponding individuation condition is:
$L_1 = L_2$ if and only if they have the same syntax, semantics, etc, etc.
(If this seems somehow too obvious to need saying, or perhaps silly, then what do you suggest? The main alternative, at least, the main one I can think of, would somehow introduce a speech community somehow in the very individuation of languages. But I think this is wrong.)

Languages then do not undergo change, either temporal or modal. Rather, we have various sequences of distinct languages. This theory of language individuation is, more or less, Lewis's, as sketched near the start of "Languages and Language" (1975). It seems to have been endorsed also by Scott Soames and Saul Kripke too.

[The issue is quite complicated because languages are usually mixed mathematicalia, grounding out somehow in concrete/physical "tokens" (e.g., tokens of certain phonemes or tokens of the letter "A", etc.); and, consequently, one needs some sort of account of the individuation criteria for mixed mathematical: e.g., the set of US Presidents or the magnetic field $\mathbf{B}$, which is a function on spacetime, to a certain linear space isomorphic to $\mathbb{R}^3$.]

Once we have got some sort of account of what languages are, and how they're individuated, we next need to provide some sort of account of how they are spoken, or as I like to say, "cognized" by agents.

I believe the most basic notions required in a workable account of this are roughly of this kind:
Agent A assigns meaning M to string $\sigma$
I call these "cognizing" relations. (One can of course add bells and whistles, various parameters, indices, contexts and time and world parameters. But I want to keep it simple.) So, I cognize my idiolect $L_{JK}$ by my mind assigning meanings to various strings (my mind also assigns some kind of syntactic structure too, and pragmatic meaning functions too, but I am ignoring that for the moment).

My mind's assigning meanings is not, I think, for the most part "conscious"; and is not, in many cases, something I can articulate. Somehow,
• I acquire meanings
• copy/borrow meanings.
• bestow meanings.
Admittedly, I don't have a good theory of this. For example, I use the string "Kripke" to mean Saul Kripke. But I do not have a good theory how I acquired this meaning assignment (it must have involved meaning copying/borrowing, as Kripke himself has argued at length). I use the string "finite ordinal number" to refer to elements of $\omega$, and again I simply don't know how I acquired this meaning assignment, except to say, flat-footedly, that I learnt set theory ...

Even so, I'm pretty sure that these features of language cognition---the basic meaning assignment relations---are what need to be clarified for the (more difficult) "cognizing" side of metasemantics.

[Though this is not forced, I'm somewhat sceptical about the notion of "shared" languages. The languages that agents or individuals speak/cognize, are, first and foremost, idiolects. But this is a very big question, involving complicated questions about normativity, "meaning gaps" and some of the topics that arise in debates about semantic internalism and externalism. Admittedly, there's a huge overlap amongst idiolects in language communities. But there is also heterogeneity as well. And this must be accounted for.]

## Sunday, 19 May 2013

### Joyce's argument for Probabilism

In January, the Department of Philosophy at the University of Bristol launched an ERC-funded four-year research project on Epistemic Utility Theory: Foundations and Applications.  The main researchers will be: Richard Pettigrew, Jason Konek, Ben Levinstein, Pavel Janda (PhD student), and Chris Burr (PhD student).  The website is here.

I thought it would be good to write a few blog posts explaining what I take epistemic utility theory to be, and describing the work that has been done in the area so far.  So, over the next few weeks, that's exactly what I'll do here at M-Phi.  I'll try for one post per week.

The guiding idea behind epistemic utility theory is this:  Over the last decade or so, epistemologists have been increasingly interested in epistemic value.  That is, they have been interested in identifying the features of a doxastic or credal state that make it good qua cognitive state (rather than good qua guide to action).  For instance, we might say that having true beliefs is more valuable than having false beliefs, or that having higher credences in true propositions is better; or we might say that a belief or a credence has greater value the greater its evidential support.  Epistemic utility theory begins by asking a further question:  How can we quantify and measure epistemic value?  Having answered that question it asks another:  What epistemic norms can be justified by appealing to this measure of epistemic value?

#### Joyce's framework

The original argument in this area is due to Jim Joyce in his paper 'A Non-Pragmatic Vindication of Probabilism' (1998) Philosophy of Science 65(4):575-603.  In this blog post, I'll describe the framework in which Joyce's argument takes place; I'll state the norm he wishes to justify; and I'll present his argument for it in the way I find most plausible.

Represent an agent's cognitive state at a given time by her credence function at that time:  this is the function c that takes each proposition about which she has an opinion and returns the real number that measures her credence in that proposition.  By convention, we represent minimal credence by 0 and maximal credence by 1.  Thus, $c$ is defined on the set $\mathcal{F}$ of propositions about which the agent has an opinion; and it takes values in $[0, 1]$.  If $X$ is in $\mathcal{F}$, then $c(X)$ is our agent's degree of belief or credence in $X$.  Throughout, we assume that $\mathcal{F}$ is finite.  With this framework in hand, we can state the norm of Probabilism:

Probabilism At any time in an agent's credal life, it ought to be the case that her credence function $c$ at that time is a probability function over $\mathcal{F}$ (or, if $\mathcal{F}$ is not an algebra, $c$ can be extended to a probability function over the smallest algebra that contains $\mathcal{F}$).

#### Joyce's argument

How do we establish this norm?  Jim Joyce offers the following argument:  It is often said that the aim of full belief is truth.  One way to make this precise is to say that the ideal doxastic state is that in which one believes every true proposition about which one has an opinion, and one disbelieves every false proposition about which one has an opinion.  That is, the ideal doxastic state is the omniscient doxastic state (relative to the set of propositions about which one has an opinion).  We might then measure how good an agent's doxastic state is by its proximity to this omniscient state.

Joyce's argument, as I will present it, is based on an analogous claim about credences.  We say that the ideal credal state is that in which our agent assigns credence 1 to each true proposition in $\mathcal{F}$ and credence 0 to each false proposition in $\mathcal{F}$. By analogy with the doxastic case, we might call this the omniscient credal state (relative to the set of propositions about which she has an opinion). Let $\mathcal{W}$ be the set of possible worlds relative to $\mathcal{F}$:  that is, the set of consistent assignments of truth values to the propositions in $\mathcal{F}$.  Now, let $w$ be a world in $\mathcal{W}$.  Then let $v_w$ be the omniscient credal state at $w$: that is, $v_w(X) = 0$ if $X$ is false; $v_w(X) = 1$ if $X$ is true.

We then measure how good an agent's credal state is by its proximity to the omniscient state.  Following Joyce, we call this the accuracy of the credal state.  To do this, we need a measure of distance between credence functions.  Many different measures will do the job, but here I will focus on the most popular, namely, Squared Euclidean Distance.  Suppose $c$ and $c'$ are two credence functions.  Then define the Squared Euclidean Distance between them as follows:
$Q(c, c') := \sum_{X \in \mathcal{F}} (c(X) - c'(X))^2$
Thus, given a possible world $w$ in $\mathcal{W}$, the cognitive badness or disvalue of the credence function $c$ at $w$ is given by its inaccuracy; that is, the distance between $c$ and $v_w$, namely, $Q(c, v_w)$.  We call this the Brier score of $c$ at $w$, and we write it $B(c, w)$.  So the cognitive value of $c$ at $w$ is the negative of the Brier score of $c$ at $w$; that is, it is $-B(c, w)$.  Thus, $B$ is a measure of inaccuracy; $-B$ is a measure of accuracy.

With this measure of cognitive value in hand, Joyce argues for Probabilism by appealing to a standard norm of traditional decision theory:

Dominance Suppose $\mathcal{O}$ is a set of options, $\mathcal{W}$ is a set of possible worlds, and $U$ is a measure of the value of the options in $\mathcal{O}$ at the worlds in $\mathcal{W}$.  Suppose $o, o'$ in $\mathcal{O}$.  Then we say that
• $o$ strongly $U$-dominates $o'$ if $U(o', w) < U(o, w)$ for all worlds $w$ in $\mathcal{W}$
• $o$ weakly $U$-dominates $o'$ if $U(o', w) \leq U(o, w)$ for all worlds $w$ in $\mathcal{W}$ and $U(o', w) < U(o, w)$ for at least one world $w$ in $\mathcal{W}$.
Now suppose $o, o'$ in $\mathcal{O}$ and
1. $o$ strongly $U$-dominates $o'$;
2. There is no $o''$ in $\mathcal{O}$ that weakly $U$-dominates $o$.
Then $o'$ is irrational.

Of course, in standard decision theory, the options are practical actions between which we wish to choose.  For instance, they might be the various environmental policies that a government could pursue; or they might be the medical treatments that a doctor may recommend.  But there is no reason why Dominance or any other decision-theoretic norm can only determine the irrationality of such options.  They can equally be used to establish the irrationality of accepting a particular scientific theory or, as we will see, the irrationality of particular credal states.  When they are put to use in the latter way, the options are the possible credal states an agent might adopt; the worlds are, as above, the consistent assignments of truth values to the propositions in $\mathcal{F}$; and the measure of value is $-B$, the negative of the Brier score.  Granted that, which credal states does Dominance rule out?  As the following theorem shows, it is precisely those that violate Probabilism.

Theorem 1
1. If $c$ is not a probability function, then there is a credence function $c^*$ that strongly Brier dominates $c$.
2. If $c$ is a probability function, then there is no credence function $c^*$ that weakly Brier dominates $c$.
This, then, is Joyce's argument for Probabilism:
1. The cognitive value of a credence function is given by its proximity to the ideal credence function:  the ideal credence function at world $w$ is $v_w$; and distance is measured by the Squared Euclidean Distance.  Thus, the cognitive value of a credence function at a world is given by the negative of its Brier score at that world.  (In fact, as we will see next week, Joyce weakens this premise and thus strengthens the argument.)
2. Dominance
3. Theorem 1
4. Therefore, Probabilism
Thus, according to Joyce, what is wrong with an agent who violates Probabilism is that there is a credence function that is more accurate than hers regardless of how the world turns out.

#### Joyce's argument in action

Let's finish off by seeing the argument in action.  Suppose our agent has an opinion about only two propositions $A$ and $B$.  And suppose that $A$ entails $B$.  For such an agent, the only demand that Probabilism makes is

No Drop If $A$ entails $B$, an agent ought to have a credence function $c$ such that $c(A) \leq c(B)$.

Now, if $\mathcal{F} = \{A, B\}$, then $\mathcal{W} = \{w_1, w_2, w_3\}$, where $A$ and $B$ are both true at $w_1$, $A$ is false and $B$ is true at $w_2$, and $A$ and $B$ are both false at $w_3$.  Also, we can represent a credence function over these propositions as a point on the Euclidean plane.  So we can represent our agent's credence function $c$ like this, but also the omniscient credence functions at the three different possible worlds.  We do so on the diagram below.  On this diagram, the blue shaded area includes all and only the credence functions that satisfy Probabilism.  As we can see, if a credence function lies outside that area, there is a credence function that lies inside it that is closer to each omniscient credence function; but this never happens if the credence function is inside the area to begin with.  This is the content of Theorem 1 in this situation.

### Der logische Aufbau der Welt

The title of Rudolf Carnap's 1928 book, Der logische Aufbau der Welt, is normally translated as "The Logical Structure of the World", although apparently a more accurate rendition would be "The Logical Construction of the World".

In working on how to make sense of the claim/significance of Leibniz Equivalence from spacetime theories (roughly: isomorphic spacetime models represent the same possible worlds), I've been trying to work out a version of a propositional view of possible worlds. This has some similarities which Carnap's theory of "state descriptions" (and with Wittgenstein's "picture theory" which influenced Carnap).

The propositional diagram account of possible worlds can be put like this:
$w$ is a possible world if and only if
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$,
where $\mathcal{A}$ is a model, and $\vec{R}$ is a sequence of relations-in-intension.
Here, given a model $\mathcal{A}$ (say, $(A, \vec{S})$ with domain $A$), then $\Phi_{\mathcal{A}}(\vec{X})$ is a formula of pure second-order logic (perhaps infinitary: it has cardinality $max(\omega, |A|^{+})$). $\Phi_{\mathcal{A}}(\vec{X})$ defines the isomorphism type of $\mathcal{A}$. The variables $\vec{X}$ are free second-order variables. I call $\Phi_{\mathcal{A}}(\vec{X})$ the diagram formula for the model $\mathcal{A}$. It correspoinds very closely to what model theorists call the elementary diagram of a model. And then $\hat{\Phi}_{\mathcal{A}}$ is the corresponding (second-order) propositional function, and $\hat{\Phi}_{\mathcal{A}}[\vec{R}]$ is then the result of saturating'' $\hat{\Phi}_{\mathcal{A}}$ with the relations $\vec{R}$.

On this "Propositional Diagram" conception of possible worlds, a world $w$ has the form
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$
One might now think of $\hat{\Phi}_{\mathcal{A}}$ as the abstract structure of the world $w$, and think of the sequence $\vec{R}$ as expressing the intensional content of $w$.

This is a kind of form/content distinction. Another way of putting this is to try and define the "representation" relation that holds between models and worlds. Let $\mathcal{A}$ be a model. Let $w$ be a world. Let $\vec{R}$ be a sequence of relations-in-intension with signature matching $\mathcal{A}$. Then:
$\mathcal{A}$ represents $w$ relative to $\vec{R}$ iff $w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$.

## Tuesday, 14 May 2013

### What's wrong with Mochizuki's 'proof' of the ABC conjecture?

(Cross-posted at NewAPPS)

A few days ago Eric had a post about an insightful text that has been making the rounds on the internet, which narrates the story of a mathematical ‘proof’ that is for now sitting somewhere in a limbo between the world of proofs and the world of non-proofs. The ‘proof’ in question purports to establish the famous ABC conjecture, one of the (thus far) main open questions in number theory. (Luckily, a while back Dennis posted an extremely helpful and precise exposition of the ABC conjecture, so I need not rehearse the details here.) It has been proposed by the Japanese mathematician Shinichi Mochizuki, who is widely regarded as an extremely talented mathematician. This is important, as crackpot ‘proofs’ are proposed on a daily basis, but in many cases nobody bothers to check them; a modicum of credibility is required to get your peers to spend time checking your purported proof. (Whether this is fair or not is beside the point; it is a sociological fact about the practice of mathematics.) Now, Mochizuki most certainly does not lack credibility, but his ‘proof’ has been made public quite a few months ago, and yet so far there is no verdict as to whether it is indeed a proof of the ABC conjecture or not. How could this be?

As it turns out, Mochizuki has been working pretty much on his own for the last 10 years, developing new concepts and techniques by mixing-and-matching elements from different areas of mathematics. The result is that he created his own private mathematical world, so to speak, which no one else seems able (or willing) to venture into for now. So effectively, as it stands his ‘proof’ is not communicable, and thus cannot be surveyed by his peers.

Let us assume for a moment that the ‘proof’ is indeed correct in that every inferential step in the lengthy exposition is indeed necessarily truth-preserving, i.e. no counterexample can be found for any of the steps. In a quasi-metaphysical sense, the ‘proof’ is indeed a proof, which is a success term (a faulty proof is not a proof at all). However, in the sense that in fact matters for mathematicians, Mochizuki’s ‘proof’ is not (yet) a prof because it has not been able to convince the mathematical community of its correctness; for now, it remains impenetrable. To top it up, Mochizuki is a reclusive man who so far has made no efforts to reach out for his peers and explain the basic outline of the argument.

What does this all mean, from a philosophical point of view? Now, as some readers may recall, I am currently working on a dialogical conception of deductive proofs (see here and here). I submit that the dialogical perspective offers a fruitful vantage point to understand what is going on with the ‘Mochizuki affair’, as I will argue in the remainder of the post. (There are also interesting connections with the debate on computer-assisted proofs and the issue of surveyability, and also with Kenny Easwaran’s notion of the ‘transferability’ of mathematical profs, but for reasons of space I will leave them aside.)

Let me review some of the details of this dialogical conception of proofs. On this conception, a proof is understood as a semi-adversarial dialogue between two fictitious characters, proponent and opponent. The dialogue starts when both participants agree to grant certain statements, the premises; proponent then puts forward further statements, which she claims follow necessarily from what opponent has granted so far. Opponent’s job is to make sure that each inferential step indeed follows of necessity, and if it does not, to offer a counterexample to that particular step. The basic idea is that the concept of necessary truth-preservation is best understood in terms of the adversarial component of such dialogues: it is strategically in proponent’s interest to put forward only inferential steps that are indefeasible, i.e. which cannot be defeated by a countermove even from an ideal, omniscient opponent. In this way, a valid deductive proof corresponds to a winning strategy for proponent.

Now, when I started working on these ideas, my main focus was on the adversarial component of the game, and on how opponent would be compelled to grant proponent’s statements by the force of necessary truth-preservation. But as I started to present this material to numerous audiences, it became increasingly clear to me that adversariality was not the whole story. For starters, from a purely strategic, adversarial point of view, the best strategy for proponent would be to go directly from premises to the final conclusion of the proof; opponent would not be able to offer a counterexample and thus would be defeated. In other words, proponent has much to gain from large, obscure (but truth-preserving) inferential leaps. But this is simply not how mathematical proofs work; besides the requirement of necessary truth-preservation, proponent is also expected to put forward individually perspicuous inferential steps. Opponent would not only not be able to offer counterexamples, but he would also become persuaded of the cogency of the proof; the proof would thus have fulfilled an explanatory function. Opponent would thus be able to see not only that the conclusion follows from the premises, but also why the conclusion follows from the premises. To capture this general idea, in addition to the move of offering a counterexample, opponent also has available to him an inquisitive move: ‘why does this follow?’ It is a request for proponent to be more perspicuous in her argumentation.

This is why I now qualify the dialogue between proponent and opponent as semi-adversarial: besides adversariality, there is also a strong component of cooperation between proponent and opponent. They must of course agree on the premises and on the basic rules of the game, but more importantly, proponent’s goal is not only to force opponent to grant the conclusion by whatever means, but also to show to opponent why the conclusion follows from the premises. Thus understood, a proof has a crucial didactic component.

One way to conceptualize this interplay between adversariality and cooperation from a historical point of view is to view the emergence of the deductive method with Aristotle in the two Analytics as a somewhat strange marriage between the adversarial model of dialogical interaction of the Sophists – dialectic – with the didactic, Socratic method of helping interlocutors to find the truth by themselves by means of questions (as illustrated in Platos’s dialogues). This historical hypothesis requires further scrutiny, and is currently one of the topics of investigation of my Roots of Deduction project, in cooperation with the other members of the project.

Going back to Mochizuki, it is now easy to see why he is not being a good player in the game of deduction. He is not fulfilling his task as proponent to make his proof accessible and compelling to the numerous ‘opponents’ of the mathematical community; in other words, he is failing miserably on the cooperative dimension. As a result, no one is able or willing to play the game of deduction against and with him, i.e. to be his opponent. Now, a crucial feature of a mathematical proof is that it takes (at least) two to tango: a proponent must find an opponent willing to survey the purported proof so that it counts as a proof. (Naturally, this is not an infallible process: there are many cases in the history of mathematics of purported ‘proofs’ which had been surveyed and approved by members of the community, but which were later found to contain mistakes.)

Mochizuki’s tango is for now impossible to dance to/with, and as long as no one is willing to be his opponent, his ‘proof’ is properly speaking not a proof. It is to be hoped that this situation will change at some point, given the importance of the ABC conjecture for number theory. However, this will only happen if Mochizuki becomes a more cooperative proponent, or else if enough opponents are found who are willing and able to engage in this dialogue with him.

## Sunday, 12 May 2013

### Science Versus Nominalism

The Indispensability Argument, developed by W.V. Quine and Hilary Putnam, and famously rebutted by Hartry Field, is fairly simple:
(1) Nominalism states that there aren't strings, formulas, numbers, sets, sequences, functions, groups, etc.
(2) Science (e.g., physics, linguistics) states that there are strings, formulas, numbers, sets, sequences, functions, groups, etc.
Therefore, nominalism is inconsistent with science.
This is not some containable inconsistency, say to do with idealization, or frictionless planes, etc. Nominalism states that there are no physical quantities. Nominalism states that Peano arithmetic doesn't exist. Nominalism states that an SU(3) gauge theory like QCD is false because there are no groups. Etc. But science states that there are quantities; that Peano arithmetic does exist, and is not finitely axiomatizable; that the gluons are associated with an SU(3) gauge symmetry; etc.

The premises (1) and (2) are justified as follows. The first (1) is how nominalists describe their view. And (2) is justified by looking at a physics textbook.

## Friday, 10 May 2013

### Leibniz Equivalence (slides)

Here are some slides for a talk on "Leibniz Equivalence" which includes some topics I've written some previous M-Phi posts about (Leibniz abstraction; the notion of abstract structure; possible worlds; the abstract/concrete distinction as modal).

The main things here are the accounts of:
(i) abstract structure: given a model $\mathcal{A}$, its abstract structure is a certain kind of second-order propositional function, $\hat{\Phi}_{\mathcal{A}}$;
(ii) possible worlds: entities $w$ such that
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$
where $\vec{R}$ is a sequence of relations.

## Tuesday, 23 April 2013

### For Mathematicians

Here's a nice shortish article (talk) by Mark Balaguer aiming to explain the basic ideas of philosophy of mathematics to mathematicians:
A Guide for the Perplexed: What Mathematicians Need to Know to Understand Philosophers of Mathematics
From the introductory paragraph:
My hope is to make clear for mathematicians what philosophers of mathematics are really up to and, also, to eliminate some confusions.

## Monday, 22 April 2013

### Fermat, set theory, and arithmetic (guest post by Colin McLarty)

This is a guest post by Colin McLarty, Truman P. Handy Professor of Intellectual Philosophy and professor of Mathematics at Case Western Reserve University. It is a follow-up to a short post I wrote last month on his exciting current work on the foundations of mathematics. In this post, Colin explains to us what the whole project is about in just 1000 words.

--------------

Some philosophers suspect mathematicians don't care about foundations but only care about what works. But that elides the problem mathematicians constantly face: what will work? And it can promote the misapprehension that modern mathematics abandons intuition in favor of technicalities. Mathematics works by making rigor serve intuition.  Mathematicians use tools that help them see how to do what they want---without breaking down even in what I will call "deliberate, utterly reliable gaps".

By that I mean points in an argument where a mathematician cites a substantial, hard to prove result where the citing mathematician may or may not have once gone through the whole proof of that result but certainly is not calling the whole proof to mind in citing it.  The citing mathematician relies on that earlier result not only to be proved correctly, but to be stated in full precision so it can be applied concisely out of context without fear of error. Major proofs today have many deliberate utterly reliable gaps, as do their citations in turn.

These themes converged in the on-line row over whether Wiles’s proof of Fermat's Last Theorem (FLT) uses Grothendieck universes. Universes are controversial in some circles since they are sets large enough to model Zermelo Fraenkel set theory (ZF) and so, by Gödel's incompleteness theorem, ZF cannot prove they exist.

The term "universe" is not in Wiles's paper. Neither are proofs of most theorems he uses.  He gives citations which cite others in turn.  The citations often lead to the works where Grothendieck and colleagues established the modern methods of number theory (and about half of today’s category theory) using universes. As he depended on those proofs so he depended on universes.

One way out is never taken. Grothendieck knew everything he does with universes in practice he could also do by discarding some larger scale structures and treating others as mere ways of speaking rather than actual entities. Number theorists often say something like this would put their work on a ZF foundation. But they give no precise statement. And really doing it would distract from arithmetic by offering un-insightful set theoretic complications for no serious foundational benefit. ZF itself is remote from arithmetic.

It is no surprise theoretically that a statement about numbers could be proved by high level set theory.   Gödel showed things like this have to happen sometimes, since any increase in the consistency strength of a foundation makes new number theoretic statements provable.  Consistency itself can be expressed by number theoretic statements. But it is surprising in fact that FLT should be proved this way. We do not expect to see the Gödel phenomenon in such simple statements.  I am working to lessen the surprise in the case of FLT and other recent number theory by bringing the proofs closer to arithmetic. I have formalized the whole Grothendieck toolkit in finite order arithmetic. That is the strongest theory that is commonly called "arithmetic".  From that point of view it is the simple theory of types including an axiom of infinity. From another viewpoint it is the weakest theory that is commonly called "set theory". It is set theory using only numbers and sets of numbers and sets of sets of numbers, all built from numbers in some finite number of levels by bounded comprehension.

The version in my article "A finite order arithmetic foundation for cohomology" looks like Grothendieck’s to anyone but a professional logician. You can just replace a few foundational passages in the Grothendieck work by this foundation. It proves less than Grothendieck’s universes in principle.  But all the general theorems actually in the Grothendieck corpus follow verbatim as Grothendieck and his colleagues proved them.

On the other hand, this foundation is still much stronger than PA. It uses every finite level above PA, though only finite levels.

My current focus is to formalize the central Grothendieck tools at the logical strength of second or third order arithmetic. On one hand this will formalize the insight of practitioners who say their work with these tools really only uses "very small sets". And on the other hand it will bring the foundation within striking distance of methods of reverse mathematics, a well-developed discipline exploring the exact logical strength of mathematical results expressible in second order arithmetic. My article "Zariski cohomology in second order arithmetic" gives some progress on this front.

One goal is to take current methods of number theory, which textbooks and reference works justify by various combinations of Grothendieck universes and hand waving, and justify them rigorously in pretty much their current form in low order arithmetic. Essential to this goal is that most proofs do not get longer and their appearance is not much changed. The other goal is to show that the great number theoretic results proved by these tools can be proved in Peano Arithmetic. It would be great to find proofs in PA without changing the existing proofs very much. But that may not be possible. At any rate it is not intrinsic to the second goal. Showing these theorems can be proved in PA is likely to require serious advances in number theory. I can try to clear up the logical side.

As a philosophical goal I want to show how Grothendieck and many mathematicians since him have cared enough to either develop rigorous foundations for these tools or else to protest foundations they do not like—and others draw on these foundations without needing to highlight them. Grothendieck has been clear that the size of sets is not important to him but the conceptual unity of his toolkit is. I have shown that unity can be preserved without anything like the size of his original universes. I regard Grothendieck as developing the unity of intuition and rigor, in terms very like the post "Terry Tao on rigor in mathematics". I hope others will too.

### Further thoughts on Priest's Inclosure Schema

After publishing my post on Priest’s Inclosure Schema (IS) a few days ago, I’ve had a number of interesting exchanges on the content of the post, including with Priest himself. So here are a few additional thoughts, in case anyone is interested.

Regarding the charge of extensional inadequacy (over- and undergeneration), I think it had been made sufficiently clear by others before me that the fact that the Curry paradox does not fit into IS is a big blow if IS claims to be a formal explanans for the informal concept of paradoxes of self-reference. However, while Priest’s original claim seemed to pertain to paradoxes of self-reference specifically, he seems to have changed a bit the intended scope of IS, and now tends to talk about ‘Inclosure paradoxes’. I don’t think there is anything wrong with this ‘change of heart’, but it does have consequences for how we should conceive the role of IS in debates on paradoxes. To make sense of this development, let me turn to a distinction introduced by S. Shapiro (in the words of L. Horsten in his SEP entry on philosophy of mathematics):
Shapiro draws a useful distinction between algebraic and non-algebraic mathematical theories (Shapiro 1997). Roughly, non-algebraic theories are theories which appear at first sight to be about a unique model: the intended model of the theory. We have seen examples of such theories: arithmetic, mathematical analysis… Algebraic theories, in contrast, do not carry a prima facie claim to be about a unique model. Examples are group theory, topology, graph theory, ….
By analogy, I would submit that IS was first introduced as a ‘non-algebraic theory’, intended to capture one very precise class of arguments, namely paradoxes of self-reference. But as things moved along, it became clear to Priest and others that IS in fact determines a different but possibly equally interesting class of arguments, which he refers to as Inclosure paradoxes. From this point of view, IS is now an ‘algebraic theory’: rather than starting with a given target-phenomenon and trying to formulate a formal account which would capture all of (and only) this phenomenon, IS is now a freestanding formal account, and it is a non-trivial question as to which class(es) of entities it accurately describes. (In non-algebraic theories, you start with the phenomenon and look for the theory; in algebraic theories, you start with the theory and look for the phenomenon.)

From this angle, it becomes a noteworthy observation, rather than an extensional failure, to notice that Curry does not fit into IS, and that the sorites paradoxes and some reductio arguments do fit into IS, thus unveiling some (surprising) structural similarities. In other words, if IS is intended as an ‘algebraic theory’, then the charges of over- and –undergeneration do not get off the ground.

But it seems to me that this would represent a significant departure from how IS was originally presented in Priest’s 1994 paper, namely as a formal explanans for the class of self-referential paradoxes. I would suggest that proponents of IS could give us a clearer account of how exactly they see the role of IS in research on paradoxes (in particular, as a non-algebraic or as an algebraic theory, in Shapiro's sense). Priest has already been moving in this direction, for example when he claims that Inclosure paradoxes are those that have to do with contradiction and with the limits of thought as such. However, it is not yet clear to me why Curry does not concern the limits of thought as such (apart from the fact that it is not captured by IS…), so I look forward to the continuation of this debate.

### It's Complicated

[This is a post using Newman-style reasoning to argue for the existence of natural properties and relations.]

Consider a claim like:
(1) The mind-independent world is complicated
One might deny that there is a mind-independent world (Idealism) or one might accept that there is, while insisting that it is "unknowable", while adding that what is known is mentally constituted (Kantian Transcendental Idealism). Here, in asserting the latter, one does merely mean representations are mentally constituted, for this is a truism that no one denies. One means that what knowledge is about is also mentally constituted (e.g, that physical objects are representations; that space and time are representations). Idealism is not the truism that our thoughts and representations are somehow in, or connected with, our minds; it is the much stronger metaphysical claim that everything (Idealism) or almost everything (Kant) is mind-dependent.

Assuming that we're not Idealists, what might this statement (1) mean? It might mean:
(2) The cardinality of the mind-independent things is quite large (e.g., $>10^{50}$).
If this is what (1) means, then the complexity of the world is solely its cardinality. Therefore, a sound and complete description of the mind-independent world consists in a statement of the form:
(3) The cardinality of the mind-independent world $= \kappa$,
where $\kappa$ is some cardinal number. It should strike anyone as surprising that the ultimate goal of physics, chemistry, biology, etc., is simply to identify this number $\kappa$. (Cf., the punchline of Douglas Adams's joke "42".) So, I take it that this is not what the statement (1) means.

So, perhaps (1) means,
(4) There are mind-independent properties and relations amongst the mind-independent things and their relations (e.g., scientific laws) are complicated.
Here "complexity" may mean something like the structural complexity of the truth set for a language containing predicates for these properties and relations. For example, the truth set for full arithmetic is more complicated than the truth set for arithmetic with just addition. For the latter is a recursive set, while the former is not recursive -- and in fact not even arithmetically definable. There are other ways of measuring complexity, notably Kolmogorov complexity, for finite strings, and various notions of computational complexity. Perhaps, if the world is finite, "complexity" might involve the Kolmogorov complexity of the simplest program that answers soundly all questions about the world.

However, independently of how one understands the concept of "complexity", one has to be careful. Suppose that by "property" or "relation" one means just any set of things, or any set of ordered pairs of things. These are properties in a very broad sense. It then follows, by Newman-style reasoning, that (4) is reducible to (3). For any structure (or classification, if you like) $\mathcal{A}$ can be imposed on some collection $C$ of things so long as there are enough of them.

To illustrate: consider a finite set $X = \{1, \dots, n\}$ of numbers, and partition it any way you like. Let the partition be $(Y_i \mid i \in I)$, where $I$ is the index set. I.e., the sets $Y_i$ are non-empty and disjoint, and $X = \bigcup_i Y_i$. Now, suppose that we have a collection $C$ of $n$ things, or physical objects, or what have you. Then it is easy to define a partition $(C_i \mid i \in I)$ of these things which is isomorphic to $(Y_i \mid i \in I)$. For since $C$ and $X$ have the same cardinality, let $f : C \to X$ be a bijection (this function enumerates the elements of $C$). Then, for each $i \in I$, define $C_i$ by:
$c \in C_i$ iff $f(c) \in Y_i$.
By construction, this gives us an isomorphism. So, if we have a partition of $n$ natural numbers (the "mathematical model") and collection $C$ of physical things of size $n$, we can partition $C$ isomorphically to the original partition. If there are no independent constraints built into $C$ itself beyond cardinality, we can impose any structure $\mathcal{A}$ we like onto $C$, modulo $C$ having cardinality at least as large as that of $\mathcal{A}$.

Consequently, if the reasonable sounding (4) is not to trivialize down to (3), the quantifier in "there are ... properties" must range over a special subset of the set of all properties in the broader sense. In principle, this might be any special subset. But, usually, what is intended is what metaphysicians call "natural properties". This is because what "selects" that subset as special is not the mind, but Nature. If one intends it to mean "there is a mind-dependent subset of properties ...", then one is back to Idealism, this is almost certainly not what (1) is taken to mean by anyone.

So, if this reasoning is right, the most reasonable interpretation of "the mind-independent world is complicated" is:
(5) There are mind-independent natural properties and relations amongst the mind-independent things and their relations (e.g., scientific laws) are complicated.
And this is much more in keeping with scientific inquiry. However, note that (5) implies the existence of mind-independent natural properties and relations.

So, if there is a mind-independent world (Idealism is incorrect) and the mind-independent world is complicated, then either this mind-independent complexity consists merely in its cardinality, or it consists in the complexity of the laws and relations amongst natural properties and relations. In particular, if Idealism is incorrect but there are no natural properties or relations, then the complexity of the mind-independent world consists solely in its cardinality.

(I'm inclined to think that this latter position is, more or less, Kant's metaphysical view.)

## Sunday, 21 April 2013

### The Probability of a Carnap Sentence

In the simplest, "logical empiricist"-style, framework for the formalization of scientific theories, we have 1-sorted language $L_{O,T}$, where the vocabulary has been partitioned into O-predicates and T-predicates (it's easy to include constants and function symbols if one wishes; but it's simpler to omit them). And scientific theories are formulated in $L_{O,T}$. The language obtained by deleting the T-predicates can be denoted $L_{O}$ and is called the observational sublanguage of $L_{O,T}$.

Suppose that $\Theta(\vec{O}, \vec{T})$ is a single axiom for a finitely axiomatized theory in $L_{O,T}$, where $\vec{O}$ is a sequence of O-predicates and $\vec{T}$ is a sequence of T-predicates. Then the Ramsey sentence of $\Theta$ is defined by:
$\Re(\Theta) := \exists \vec{X} \Theta(\vec{O}, \vec{X})$,
where $\vec{X} = (X_1, \dots)$ is a sequence of second-order variables matching the arities of the predicates $T_1, \dots$ in $\vec{T}$. So, the theoretical predicates have been replaced by second-order variables, and existentially quantified.

Nothing has been said about the meanings of the O-predicates and T-predicates. In principle, one could simply assume some $L_{O,T}$-interpretation $\mathcal{I}$, and let $(L_{O,T}, \mathcal{I})$ be the corresponding fully interpreted language. However, the logical empiricists---the first group of thinkers aiming to apply the newly emerging methods of mathematical logic to the formalization of scientific theories---did not adopt this approach, Instead, largely because of their empiricist metasemantics, they assumed only an $L_{O}$-interpretation $\mathcal{I}^{\circ}$, and consequently $(L_{O,T}, \mathcal{I}^{\circ})$ is then a partially interpreted language.

Because the language is partially interpreted, for each O-predicate $O_i$, there is now a meaning, $(O_i)^{\mathcal{I}^{\circ}}$. How then do the T-predicates get their meanings? Certainly not by explicit definition in terms of O-predicates! In a sense, the new underlying idea is that the meanings of T-terms are not pinned down uniquely and independently of theory, but rather implicitly defined by theories themselves. The basic way of implementing this view of meaning is to consider the Carnap sentence of the theory $\Theta$, i.e.,
$\Re(\Theta) \to \Theta$
and to insist that this sentence is analytic --- true in virtue of meaning.

As Hannes Leitgeb has pointed out in the talk I mentioned in the post "The Probability of a Ramsey Sentence" yesterday, it now seems reasonable, to assign probability 1 to the Carnap sentence. After all, if $\phi$ is analytically true, surely its probability should be 1, whether or not probability is understood subjectively or not. So, we assume that we have some probability function $Pr(.)$ defined over $L_{O,T}$-sentences.

What can we say about connections between the probabilities of theories and their ramsifications? Well, as explained in tre previous post, if the Carnap sentence has probability 1, i.e.,
$Pr(\Re(\Theta) \to \Theta) = 1$
then we can show that,
$Pr(\Re(\Theta)) = Pr(\Theta)$.
On the other hand, suppose that the Carnap sentence has probability slightly lower than 1. E.g., suppose that,
$Pr(\Re(\Theta) \to \Theta) = 1 - \epsilon$
for some small parameter $\epsilon$. In this case, it follows that
$Pr(\Re(\Theta)) = Pr(\Theta) + \epsilon$.
Proof: By the Lemma in the previous post,
$Pr(\Theta) + Pr(\Theta \to \Re(\Theta)) = Pr(\Re(\Theta)) + Pr(\Re(\Theta) \to \Theta)$.
But $Pr(\Theta \to \Re(\Theta)) = 1$, because $\Theta \vdash \Re(\Theta)$ (assuming second-order logic). So,
$Pr(\Theta) + 1 = Pr(\Re(\Theta)) + 1 - \epsilon$.
So, $Pr(\Re(\Theta)) = Pr(\Theta) + \epsilon$, as required. QED.

So, if the Carnap sentence for a theory has a probability lower than 1 by some amount, then the Ramsey sentence for the theory has a higher probability than the theory does, by that same amount. This makes sense intuitively, because the Ramsey sentence is, in a number of senses, weaker than the theory itself (unless it happens to be inconsistent, of course).

## Saturday, 20 April 2013

### The Probability of a Ramsey Sentence

This post is inspired by a recent very interesting talk, "Theoretical Terms and Induction", by Hannes Leitgeb at a conference on "Theoretical Terms" in Munich a couple of weeks ago (April 3-5th, 2013).

Hannes's talk is a response to the debate about whether a Ramsey sentence for a theory $\Theta$ can account for the inductive systematization of evidence given by $\Theta$ itself. This debate goes back to earlier works by Carl Hempel and Israel Sheffler (The Anatomy of Inquiry, 1963) and, in particular, a 1968 Journal of Philosophy paper, "Reflections on the Ramsey Method" by Sheffler. The debate has recently been revived in an interesting 2012 Synthese paper, "Ramsification and Inductive Inference", by Panu Raatikainen. The conclusion of this argument is that ramsification of a theory $\Theta$ damages the inductive systematization that the theory $\Theta$ provides. I recommend interested readers consult Panu's 2012 paper on this.

On Hannes's approach, one assigns a probability to a Ramsey sentence $\Re(\Theta)$, on the assumption that the corresponding Carnap sentence
$\Re(\Theta) \to \Theta$
has probability 1. Since Carnap himself insisted that the Carnap sentence of a theory is analytic, it seems reasonable, on his perspective, to assign it probability 1. On this Carnapian assumption, it can then be shown that the probability of a theory and its Ramsey sentence are the same. (Hannes's discussion also related these probabilistic conclusions to the notion of logical probability, counting models of a theory, over a finite domain.)

To explain what's going on, note first that it's well-known that $\Theta$ and $\Re(\Theta)$ are deductively equivalent with respect to the observation language $L_O$. That is, for any $\phi \in L_O$, we have,
$\Theta \vdash \phi$ if and only if $\Re(\Theta) \vdash \phi.$
But suppose the Carnap sentence has probability 1. Then we can show that $\Theta$ and $\Re(\Theta)$ are probabilistically equivalent.

First, we give a lemma in probability theory:
Lemma:
$Pr(A) + Pr(A \to B) = Pr(B) + Pr(B \to A)$.
Proof. Reasoning using probability axioms,
$Pr(A \to B) = Pr(\neg A \vee B)$
= $Pr(\neg A) + Pr(B) - Pr(\neg A \wedge B)$
= $1 - Pr(A) + Pr(B) - Pr(\neg (B \to A))$
= $1 + Pr(B) - Pr(A) - 1 + Pr(B \to A)$
= $Pr(B) - Pr(A) + Pr(B \to A)$.
So:
$Pr(A) + Pr(A \to B) = Pr(B) + Pr(B \to A)$.
QED.

Next, let $\Theta$ be a theory and let $\Re(\Theta)$ be its Ramsey sentence. Note that
$\Theta \vdash \Re(\Theta)$.
(That is, one can deduce $\Re(\Theta)$ from $\Theta$ in a system of second-order logic, using comprehension.)

It follows that,
$Pr(\Theta \to \Re(\Theta)) = 1$.
Suppose that the Carnap sentence, $\Re(\Theta) \to \Theta$, has probability 1. That is,
$Pr(\Re(\Theta) \to \Theta) = 1$.
Then the Lemma above gives:
$Pr(\Re(\Theta)) = Pr(\Theta)$.
So, given a theory $\Theta$. the probability of its Ramsey sentence equals the probability of the theory itself, on the assumption that its Carnap sentence has probability 1.

[UPDATE 20 April: I have made a few changes and modified the Lemma used to a slightly stronger one.]

### Two objections to Priest's Inclosure Schema

(My student Rein van der Laan will be defending his Bachelors thesis on Priest’s Inclosure Schema this week. It was in the process of supervising him that I developed my current ideas on the topic, which means that the content of this post is basically joint work with Rein.)

In a number of papers (such as this 1994 paper) and in his book Beyond the Limits of Thought (BtLoT), Graham Priest defends the claim that all paradoxes of self-reference can be adequately captured by the Inclosure Schema IS, which he formulates in the following way:

(1) Ω = {y; φ(y)} exists and ψ(Ω)               Existence
(2) if x Ω and ψ(x) (a) δ(x) ∉ x               Transcendence
(b) δ(x) Ω             Closure

The different paradoxes of self-reference would be generated by different instantiations of the schematic letters of the schema (for details, consult BtLoT).

There have been quite some articles discussing IS in the meantime (among others: Abad, Grattan-Guinness, Badici, and responses by Priest and Weber), where a number of interesting objections have been raised against the idea that IS successfully describes all paradoxes of self-reference (the Liar, Russell’s paradox etc). Here I discuss two (not necessarily novel) objections that I think are quite problematic for Priest’s general project with IS -- in particular, that of arguing for the Principle of Uniform Solution: similar paradoxes must receive similar solutions. (Unsurprisingly, he goes on to claim that only dialethism is able to offer an uniform solution to all these paradoxes.)

The over/undergeneration objection. One plausible way to understand what Priest is up to with IS is that it is intended as a formal explanans for the informal notion of ‘paradoxes of self-reference’. If this is correct, then it is legitimate to raise the question of whether IS gets the extension of the informal concept right; it may overgenerate (arguments which we do not want to count as self-referential paradoxes would fit into the schema) and/or undergenerate (it may fail to capture arguments which we do want to count as self-referential paradoxes).

As it turns out, IS seems both to over- and undergenerate. It overgenerates in that a number of reductio arguments which are not paradoxical, properly speaking, seem to conform to IS (as pointed out e.g. by Abad). One example would be Cantor’s diagonal argument for the uncountability of the real numbers (see this earlier blog post of mine for a presentation of the argument). And it undergenerates in that Curry’s paradox, which obviously (?) should count as a self-referential paradox, cannot be accounted for by means of IS. Priest is well aware of this limitation, but retorts:
[Curry] paradoxes belong to a quite different family. [They] do not involve negation and, a fortiori, contradiction. They therefore have nothing to do with contradictions at the limits of thought. (BtLoT, 169)
This seems odd, as the original claim seemed to be that IS was meant to describe paradoxes of self-reference in general, not only those involving a negation. (To be fair, Curry is the hardest of all paradoxes; as Graham himself says, Curry is hard on everyone…) At any rate, if IS both over- and undergenerates as a formal explanans of paradoxes of self-reference (which is at least what the original 1994 paper seems to claim it should be), this is not good news for Priest’s general project. (He may, of course, say that Curry falls out of the scope of IS and thus of the Principle of Uniform Solution, but the overgeneration charge still stands.)

The form/matter objection. A useful and frequently cited definition of paradoxes is the one offered by Sainsbury (2009, 1): what characterizes a paradox is “an apparently unacceptable conclusion derived by apparently acceptable reasoning from apparently acceptable premises.” This means that one crucial component of a paradox is the degree of belief an agent attributes to the premises and the cogency of the reasoning, and the degree of disbelief she attributes to the conclusion. The ‘apparently’ clause does not need to entail a relativistic conception of paradoxes, but it does mean that a paradox has a perspectival component. Galileo’s paradox was paradoxical for Galileo and many others, but not for Cantor, who did not see the conclusion of the reasoning as unacceptable.

Now, there is an old but by now largely forgotten conception of the form and matter of an argument according to which the matter of the argument is the ‘quality’ of its premises. In this vein, a materially defective argument is one where the premises are false, while a formally defective argument is one where the reasoning is not valid. (See this article of mine for the historical background of this conception.) On the basis of this idea, we could say that the matter of an argument corresponds to one’s degree of belief/disbelief in the premises/conclusion (again, perspectival), and the form corresponds to the structure of the argument. This would entail that paradoxes come in degrees, in function of the agent's degrees of (dis)belief in the premises, reasoning and conclusion.

With this distinction in mind, we can see why IS fails to capture the extension of the concept of paradoxes of self-reference: it captures only the form of such arguments, but is silent concerning their matter (understood as the degrees of (dis)belief in premises and conclusion). The paradoxical nature of a paradox, however, is crucially determined by the degrees of (dis)belief in the premises and conclusion (as made clear in Sainsbury’s quote). [UPDATE: this sentence has been misunderstood by many people. Notice that I am here using an unconventional understanding of the matter of an argument (introduced in the previous paragraph), not the more familiar schematic notion of form vs. matter.] This is why IS cannot differentiate between a truly paradoxical argument and a reductio argument, intended to establish the falsity of one of the premises rather than being truly paradoxical.

[UPDATE: In BtLoT Priest introduces the restriction that different instantiations of IS must yield true premises for an argument to count as an instantiation of IS, and thus to be an inclosure paradox. This is why the Barber then does not count as an inclosure paradox. This restriction seems to me to be too strong, as often what is under discussion when a paradox emerges is whether the apparently acceptable premises are indeed as acceptable as they seem.]

So ultimately, the conclusion seems to be that IS fails to deliver what Priest wants it to deliver. Nevertheless, I firmly believe that the formulation of IS has been one of the most interesting and important developments in research on paradoxes of the last decades. It forces us to think about paradoxes with a much-needed higher level of generality, and thus leads to a new, deeper understanding of the phenomenon – even if the conclusion must be that IS cannot be the whole story after all.

UPDATE: Some further thoughts on the Inclosure Schema here.

### Theoretical Terms in Mathematical Physics

Semantics is the theory of meanings of expressions, normally with a particular, fixed language in mind. In semantics, one might be interested in:
• the meaning of "the" (in English)
• the meaning of "and" (in English)
• the meaning of adverbs (in English)
• etc.
For example, the meaning of "the" in English for expressions of the form "the $F$" (definite descriptions) was given a famous analysis by Bertrand Russell in his article "On Denoting" (1905). This analysis is contextual (i.e., "the $F$" is not explicitly defined). Following Russell, the statement
The current Prime Minister of the UK studied PPE
is analysed as,
There is exactly one current Prime Minister of the UK and this person studied PPE.
Metasemantics is the metatheory of semantics. In metasemantics one is interested in questions like:
• what are languages, in general?
• what is status of claims about the semantic properties of languages?
• how are languages acquired, spoken, implemented, cognized, grasped, etc., by minds?
One particularly pressing part of metasemantics concerns the semantics of theoretical expressions in science. A reasonable metasemantics for science aims to explain how meanings of theoretical expressions in, e.g., the language(s) of mathematical physics are grasped and assigned to linguistic strings. For example, how do we grasp the meanings of expressions in a passage like this:
Passage 1.
Consider a massive uncharged scalar field $\phi$ propagating on flat Minkowski spacetime $M$ with a potential $V(\phi)$. The field $\phi$ satisfies the Klein-Gordon equation,
$(\square + m^2) \phi + \frac{\partial V}{\partial \phi} = 0$.
Next let us consider the behaviour of this field when we consider a small graviton field $h_{\mu \nu}$ coupled to the energy tensor $T_{\mu \nu}$ of $\phi$.
(I've made this passage up, but it's the sort of thing one reads in a mathematical physics textbook or a paper.)

There are two main problems with the language of modern mathematical physics. The first is to understand how mathematical expressions obtain meaning. And the second is to understand how theoretical expressions like "massive uncharged scalar field", "potential", etc., obtain physical meanings.

We can sort of "undo" the physical content of Passage 1 as follows.
Passage 2.
Consider a scalar function $\phi$ on a differentiable manifold $M$ diffeomorphic to $\mathbb{R}^4$, with metric $g_{\mu \nu} = diag\{1,-1,-1,-1\}$. Let $\phi$ satisfy the equation,
$(\square + m^2) \phi + \frac{\partial V}{\partial \phi} = 0$,
for some $m \in \mathbb{R}^+$ and some function $V(\phi)$. Next let us consider the behaviour of this field when we consider a small symmetric (0,2) tensor $h_{\mu \nu}$ on $M$ coupled to the tensor $T_{\mu \nu}$ defined as follows ....
In Passage 2, we use notions like "manifold", "diffeomorphic", "$\mathbb{R}^4$", "scalar function", "metric", "$\mathbb{R}^+$", "symmetric (0,2) tensor". All of these can be defined in pure mathematics (and, in fact, reduced to the language of $ZF$ set theory, although this would be a bit nutty). For example,
A manifold $M$ is a topological space such that ...
In Passage 1, however, we are imagining a possible physical world, whose underlying physical spacetime is rather like our actual spacetime would be if it were flat (i.e., Minkowski), along with certain physical fields with certain properties (i.e., a spin zero scalar field with mass $m$).

It strikes me as highly implausible to suppose that the notions from mathematical physics in Passage 1 can be somehow reduced to "logical constructions from sense data" as Bertrand Russell and Rudolf Carnap had hoped. But even so, it remains very unclear how human cognition can mentally represent how a hypothetical massive scalar field would behave under these circumstances. We can. It's just not clear how we can.

A metasemantic theory which accounts for the semantics of Passage 1 almost certainly will involve "heavy-duty" notions from Lewisian metaphysics: in particular, modality and "natural properties", etc.

### Theoretical Terms: Defining E and B

It is sometimes claimed by philosophers of science that the meanings of theoretical terms are implicitly fixed by the total theory $\Theta$ (i.e., theoretical laws/equations plus correspondence rules) in which these terms appear. This is then the basis for the philosophical claim that a Carnap sentence,
$\Re(\Theta) \to \Theta$.
is analytic -- i.e., true in virtue of meaning. In a sense, on this view, theoretical terms are (second-order) Skolem constants (or, equivalently, Hilbertian $\epsilon$-terms).

This claim about the semantics of theoretical terms is, however, inconsistent with the standard practice of physics, for example. In physics, one usually adopts far more local definitions of theoretical terms.

For example, in electromagnetism, the Lorentz force law plays a crucial role, but Maxwell's equations do not. So, the following formulation by Professor James Sparks at the Mathematical Institute in Oxford corresponds fairly closely to the definitions that I learnt as a physics undergraduate a long, long time ago (at the Other Place):
The force on a point charge q at rest in an electric field $\mathbf{E}$ is simply
$\mathbf{F} = q \mathbf{E}$.
We used this to define $\mathbf{E}$ in fact. (Sparks, Lecture notes on "Electromagnetism", p. 12)
Notice that Maxwell's equations are not mentioned. The meaning of
"the electric field at point $\mathbf{r}$"
is not implicitly defined in terms of Maxwell's equations. Rather it is explicitly defined using the notion of a force on a charged test particle.
When the charge is moving the force law is more complicated. From experiments one finds that if $q$ at position $\mathbf{r}$ is moving with velocity $\mathbf{u} = d\mathbf{r}/dt$ it experiences a force
$\mathbf{F￼}￼ = q \mathbf{E}(\mathbf{r}) + q \mathbf{u} \wedge \mathbf{B}(\mathbf{r})$. $\text{ }$ (2.8)
￼Here $\mathbf{B} = \mathbf{B}(\mathbf{r})$ is a vector field, called the magnetic field, and we may similarly regard the Lorentz force $\mathbf{F}$ in (2.8) as defining $\mathbf{B}$.
(Sparks, Lecture notes on "Electromagnetism", pp 12-13)
Again, notice that Maxwell's equations are not mentioned. The meaning of
"the magnetic field at point $\mathbf{r}$"
is not implicitly defined in terms of Maxwell's equations. Rather it is explicitly defined using the notion of a force on a charged test particle.

This is not the end of the story, of course. But it casts considerable doubt on the claim that the meanings of theoretical terms are given by implicit definitions within global theories. The definitions are far more local. It is not even clear that these local definitions fit the mould of what a logician would call a genuine explicit definition. But, in any case, one does not use the whole apparatus of Maxwell's equations to define the expressions "electric field" and "magnetic field".