I’ll come right out and say that I’m feeling “deflationary” about this. The question, as I understand it, is how it comes to be that *y* can be freely substituted for *x* in “*y* > 0″ without meaningfully changing the expression, but not into “*x* > *y*“.

I think this again comes down to which sorts of substitutions are *harmless renamings of the underlying set* and which *violate previously established constraints*. I see expressions like “*x* > 0″, “*y* > 0″, and “*x* > *y*” as boolean-valued expressions of real variables. That is, the first two define functions , while the third defines a function . Each of these such functions assigns to each point in the parameter space the truth value of the point’s real coordinates under the function’s associated boolean expression. Of course, each such function also naturally defines a subset of the domain : the preimage of 1, or, in other words, the subset of the domain consisting of points whose coordinates evaluate to *true*.

It might help to back up a bit here and talk about what we actually mean when we coordinatize. Taking for example the expression “*x* > 0″, the corresponding function should be viewed as a map of the underlying sets. The same is so for functions like .

The role played by “*x*” in defining functions like these is actually somewhat tricky. It helps to view *x* separately from the underlying map of sets. There are many (*many*) maps , and “*x* > 0″ singles out just one of them. The fact that a coordinate is used is immaterial—at least for now (but see below). *x*, in other words, is merely a “handle” that allows us to latch onto an underlying element of the set , and “describe where we want to send it, through a simple rule”. In the former of the above two examples, this rule exploits ‘s (total) order structure, while in the latter, it exploits ‘s field (actually ring) structure. Again, though, a general map of into some set need not take any form expressible conveniently through an expression of ‘s canonical coordinate *x*.

The key, here, is that insofar as (any point of) has two natural underlying coordinates, it should be no surprise that problems happen when we use a name already reserved for one to refer to the other. Viewing as the cartesian (pardon the pun?) product , the *underlying* constituents of any point are “its first coordinate” and “its second coordinate”. To call the canonical coordinate of a generic point of *y* as opposed to *x* is a harmless renaming of the same underlying object. But provided that one of ‘s coordinates has already been called “*x*“, we shouldn’t call the second one “*x*” too. Beyond its being unclear in “*x* > *x*” which of the two of ‘s coordinates “*x*” actually refers to, we must assume, at the very least, that it refers to the same one—at which point the discrepancy between this expression and “*x* > *y*” becomes clear.

The viewpoint one ought to take with respect to coordinatization is perhaps made more evident by the theory of smooth manifolds. Such a thing is an *underlying topological space* which, at least locally (that is, on a sufficiently small neighborhood around any given point) can be (homeomorphically) represented as an open domain in (this representation is called a (local) *chart*). This so far has just been a statement about local topological structure, but more is true: we only consider local coordinatization schemes which are *mutually compatible* in the sense that each pair of local coordinate charts which happens to overlap overlaps in a smooth way, so that these two charts, viewed as maps into , differ from each other by a smooth (infinitely differentiable) map from to itself.

The first thing to realize here is that any function *f* from a manifold *M* into can be viewed, at least locally around any fixed point , as a map from some open domain into , via a chart around (whose existence is guaranteed by the axioms). The key consequence of compatibility of charts, though, is that if this function *f* happens to be smooth (as a function ) in *one* such identification, then it will be smooth in *any* such, and that, as a consequence, we can talk about smooth functions *without talking about coordinates at all*. Many of the great achievements in Riemannian geometry (e.g., general relativity) center around this “coordinate-free” approach.

All of this is just to say that concentrating on the maps of underlying sets (with additional, e.g. smooth, structure) is the order of the day, and that coordinates, which should be used only when necessary (e.g. when performing local calculations), tend only to bring trouble.

This is immediately relevant to Kit Fine’s example, I contend. Insofar as we understand the underlying maps of sets suggested by the expressions “*x* > 0″, “*y* > 0″, and “*x* > *y*“, the relevant issues seem to arise around the use of the symbols “*x*” and “*y*” to refer to coordinates of these particular sets.

I like your approach via this idea of “loss of generality”. Indeed, it’s certainly conceivable that a fallacious proof could proceed along lines like those you suggest; namely, it could prove something under a special assumption and then illicitly claim that an analogous result holds even in the absence of that assumption. In such a case, the flaw would reside in the claim that some particular assumption can be made “without loss of generality”‘s being false; one imagines, to the contrary, that this particular assumption *does* restrict generality—in, moreover, a material way, that is, one for which the reduction of the general case to the special case is not obvious (or true).

Indeed, what it *means* to say that some assumption can be made “without loss of generality” is that the truth of some particular statement even in the absence of this assumption can be obviously, or easily, deduced from the truth of the corresponding statement in the presence of the assumption.”This assumption will therefore be made going forward,” the prover states. “Deduce the general result from the ensuing particular one yourself.”

As a side note, I dislike the use of “without loss of generality” (or “we may assume” or any of its other variants) in proofs; I personally avoid it, and consider it a sort of bad practice. If you’re going to impose an assumption, then explain, or gesture towards, how exactly the unconditional result is to be deduced from the conditional one. Often these reductions hide a lot of complexity, and “without loss of generality” is sometimes used as a crutch.

Here’s an example from the field which *does* feature ample justification, drawn from my favorite reference, the Stacks Project:

“Formation of normalization commutes with étale localization by More on Morphisms, Lemma 36.17.3. Checking that local rings are regular, or that a morphism is unramified, or that a morphism is a local complete intersection or that a morphism is unramified and has a conormal sheaf which is locally free of a given rank, may be done étale locally (see More on Algebra, Lemma 15.41.3, Descent, Lemma 34.20.28, More on Morphisms, Lemma 36.52.13 and Descent, Lemma 34.7.6).

By the remark of the preceding paragraph and the definition of normal crossings divisor it suffices to prove that a strict normal crossings divisor satisfies (2).”

This paragraph, in other words, shows that in proving the implication (1) –> (2), one may assume without loss of (material) generality that D is in fact a *strict* normal crossings divisor (as opposed to a general one).

This is all well and good, but I’ve oversimplified things a bit. The refrain “without loss of generality”—and this is partly why I dislike it so much—can actually mean a few subtler things, having to do with naming. It can often mean something like “the naming scheme we are about to adopt does not violate any of the constraints it is subject to”. I myself am guilty of using this phrase (in this second sense) in, fittingly, the paper I wrote on this subject:

“Writing without loss of generality the adversary’s (real) numbers as , we denote by…"

This statement does two things. For one, it gives names (namely ) to the adversary's previously unnamed *n* numbers; in addition, it “assumes” that these numbers are indexed in ascending order. This, of course, is not an *assumption* in the sense in which that *D* is a strict, as opposed to a general, normal crossing divisor is; to the contrary, it is merely a stipulation that the aforementioned naming will be carried out in a particular way. The significant thing is that this particular way (namely, that under which the numbers are sorted) doesn’t violate any constraints imposed by previously accepted assumptions. Of course it doesn’t, since these numbers haven’t even been named yet.

An example in which an issue *would* arise is the following:

*”An adversary picks *n* numbers in order, say, . Without loss of generality, assume that ."

This of course is a serious problem, as the "assumption" that the numbers are named so as to be sorted contradicts a previously imposed constraint, namely that according to which they're also named in the order in which the adversary picked them.

A more subtle example of the use of "we may assume" to usher in a "contradiction-free renaming" is visible further down in my paper:

"Each set in the above inequality differs from only in its lacking some particular element of the latter set (determined by the element ). This situation thus mimics that of expression (2), and for notational convenience (that is, up to a reindexing), we may adopt its setting in what follows."

I'll myself admit that I was a bit light on the details here. The content of the claim, though, is that the general case (corresponding to an arbitrary trailing code) differs merely by a constraint-satisfying renaming from the special case (corresponding to no trailing code).

Another example might be:

"Suppose that are linearly independent elements of an *n*-dimensional vector space over . Without loss of generality, assume that and that is the standard basis vector for .”

The “renaming” in this case is actually the vector space isomorphism induced by sending , whose injectivity (and surjectivity, by a dimension count) is guaranteed by the linear independence of the . Hence the subtlety of “without loss of generality”: if this linear independence had not been assumed, then the “assumption” would have been fallacious, a loss of generality. (That is, it would have *introduced* the assumption of linear independence, which in this case is equivalent to this vector space map’s being an isomorphism.)

A very common situation “in the wild” in which this occurs is that in which a point is translated to the origin. For example: “Assume without loss of generality that (this simplifies the notation considerably, but does not change the mathematics)” (Greene and Krantz 2006, p. 81). For a much more sophisticated case, see Schmid: “Thus, localizing the problem, one arrives at the following situation: the period mapping is defined on a polycylinder, from which some coordinate hyperplanes have been removed; in other words, on a product of punctured discs and discs.”

“Assume that ?” you might say. “That’s absurd! …*P* does not equal 0!” Well, sure, it doesn’t. The point though is that the underlying set (say the complex plane) can be renamed (that is, its coordinatization translated) in such a way that *P* becomes 0 when all is said and done, and no prior constraints are violated.

As if things weren’t bad enough, I’ll point out (what I think is) a still further sense in which “we may assume” is used. This is an inductive one. Vaguely, it goes like: “[Base case] is true. Thus, we may assume [case *n*] is true, provided that we prove the inductive step [case *n* implies case *n* + 1].” In fact, this sense is quite different, and it may just be a coincidence that the same words are used. (Really, it boils down to “it suffices to prove the implication case *n* –> case *n* + 1″.) In any case, this too appears in my paper:

“By induction, therefore (where the base case is exactly the classical Proposition 1, in light of Remark 2) we may assume that, for each trailing code , each instance of the inner expression above features an inequality: [], leaving unproven, only the inductive step [], where, …”

In sum, I’ll conclude that these senses all appear at least vaguely related, in the sense that they constitute reductions or simplifications from which the general result can nonetheless be deduced. Yet the first sense is a substantive (semantic?) reduction, and the second is a nominal (syntactic?) simplification, merely a renaming, which replaces one situation by an identical one in which things are named so as to better suit our purposes. The third is perhaps like a sequence of applications of the second: one for each inductive step in a complex chain, or rather hierarchy, of deductions, which all, in virtue of the inductive formulation of the problem, share an identical, or analogous, structure, albeit admitting various instantiations thereof.

With all of this said, I’m not quite convinced that the fallacious switching argument in the “two boxes” paradox actually constitutes a spurious loss of generality of this sort. A loss of generality of the sort I (and, it seems like, you) have described occurs, I take it, when a material *restriction*, or strengthening, of a set of assumptions is substituted illicitly for the general (or weaker) set. And yet the assumption “Box A has $x in it”, and therefore that “Box B has either $2x or $x/2” doesn’t, as I see it, constitute a strict subset of the set of situations that could actually arise. It’s completely separate from this set. Nowhere in the proper understanding of the box paragraph does an asymmetry of this kind arise; the kind of asymmetry that *does* arise is that under which Box A has $x and Box B has $2x (for some fixed x). Yet were we to substitute this situation (which will occur half the time) for the general case, then the expected proceeds after switching would be exactly $2x, not $5/4x.

So yes, I think the box paragraph can only be explained by an “antinomy of the variable” approach, and *not* as a spurious loss of generality. That’s how it seems to me.

Regarding the content, I actually have a lot to say and I preemptively apologize for making heavy use of references to published work of others.

The two envelopes problem really is a fascinating puzzle. Some philosophers have written on it too, as a puzzle for decision theory, and there are even those in both the philosophical and mathematical community who refer to a “two envelopes paradox”, rather than merely a “puzzle”. I’ve had occasion myself to think about it more carefully recently, in conversation with you and with a colleague here in the department.

The more I’ve thought about it, the more the whole thing seems to involve, as you and others have already said in some way or another, some sort of fallacy of ambiguity, i.e. of illicitly treating an item in our reasoning (e.g. a term) as having multiple distinct logical roles, or meanings. A simple example is the famous “proof” that a ham sandwich is better than eternal happiness:

i. Nothing is better than eternal happiness. (Premise)

ii. A ham sandwich is better than nothing. (Premise)

iii. The relation x is better than y is transitive. (Premise)

Therefore:

iv. A ham sandwich is better than eternal happiness.

The fallacy of ambiguity here is clear. In i ‘nothing’ is used as a quantifier (“there is no x such that x is better than eternal happiness”), but in ii ‘nothing’ is used as a term (“there is something, call it ‘nothing’, such that a ham sandwich is better than this thing”).

In the case of the two envelopes problem, in which one reasons to the conclusion that one should switch, I think something similar is going on. Here’s my (fairly inchoate) analysis – in which I discuss a variant of the problem (I’ll write about boxes rather than envelopes in case it’ll be convenient to distinguish the puzzle I’ll discuss from the one you’ve discussed, but obviously boxes vs envelopes is only a notational difference):

So, the following case involves fallacious reasoning.

(1) There are two boxes A and B. One box contains twice as much money as the other. Selecting one box, say A, we should take ourselves as having $x. Since we have $x dollars, switching seems to imply that we get either $x/2 with 1/2 probability or $2x with 1/2 probability. Numerically, switching appears to have the expected utility of (1/2)($x/2) + (1/2)($2x) = $5x/4. Obviously $5x/4 > $x so it looks like we should switch right? (WRONG!)

The fallacy here is much discussed. Reflecting on it a bit, to me it’s always seemed like, on the surface, the problem is that we reason as though the situation is not symmetrical in a way that it clearly is. We fix (or determine or stipulate or assume) the value of our selected box, A, as being $x, then we reason about the other box as being indeterminate in value (either half $x or twice $x, with equal probability). But we can apply the exact same reasoning to the other box, making our switching strategy illicitly dependent on which box we arbitrarily assign the value of $x to in the first place.

The underlying logical fallacy at this point is something we might call “the fallacy of the unqualified (or unconditional) conclusion”. I don’t know whether this fallacy has a pre-existing formal title already, but this is what I call it. Basically, it’s when you reason from some assumption to some conclusion, but then infer the conclusion as holding in an unqualified way, i.e. outside the scope (in the proof theoretic sense) of the initial assumption. In terms of the material conditional, the fallacy is the following illegal (and visibly bonkers) inference:

I. If P then Q

Therefore

II. P

To elaborate, supposing the value of box A to be fixed (as $x), we can then reason about the value of box B in a way which makes it seem like switching to B is better, as in (1). But switching to B is better only in the context of the assumption that the value of A is fixed at $x. It is fallacious to conclude in an unqualified or unconditional way (outside of the scope of our antecedent assumption that the value of A is $x) that choosing B is the better option overall.

The fallacy of the “unqualified conclusion” is a simple scope violation fallacy really, but in this context it incorporates a fallacy of ambiguity too, because it sort of leads us to confuse the variable ‘x’ with a constant or non-variable. Our unqualified conclusion that it is better to switch boxes subtly treats ‘x’ as though it were not a true variable which could just as denote the amount in the other box, but rather as though it were a variable under some particular assignment or particular class of assignments (i.e. those assignments which make the expected value of the other box a fixed function of x). This also looks like a case of what might be called a “loss of generality”. Maybe I’m abusing terminology a bit here, but the thought is that we are initially introducing a variable in and appropriate way because it has a certain generality. But we then conclude something that loses the generality which allowed us to introduce the variable in the first place.

The correct reasoning in the box case is visible when we don’t pick an arbitrary box first but rather think in fully general terms as follows:

(2) We’ve selected a box with either $x or $2x and the combined amount in both boxes is $3x. The gain in switching boxes is determined by noting that we will either gain an additional $x with probability 1/2 (if we initially chose the box with $x) or we will lose $x with probability 1/2 (if we initially chose the box with $2x). In other words, the expected utility of switching is:

EU(Switch) = 1/2(+ $x) + 1/2 (- $x) =(1/2x) – (1/2x) = 0

So we should actually be indifferent to switching.

The situation is the *opposite* to the situation in the famous Monty Hall problem, and it’s intuitively clear why it’s the opposite. In the Monty Hall problem, switching doors is only not a good idea if one is as likely (or more likely) to have initially selected the winning door as to have selected a non-winning door. But, subjectively, since there are three doors in the MH problem, one is not as likely (or more likely) to have selected the winning door as one is to have selected the non-winning door. So it’s not the case that switching doors is not a good idea, i.e. switching doors is a good idea.

Contrariwise, in the box switching problem, there are only two boxes, only two options, and so, subjectively, one is as likely to have chosen the best box as not at the outset. This is why switching is not a preferable option for the two boxes, there is no probabilistically significant difference between the two options; switching is just as good as keeping.

The philosopher of language, Jeff Speaks, at Notre Dame has done some interesting analysis of this too – though I actually think his comments go slightly awry at a crucial point. He considers multiple (exactly seven) versions of the problem which vary on different parameters (e.g. whether you open your envelope or whether your opponent opens theirs, the means of randomization etc.).

(Here’s a link to some of his notes: https://www3.nd.edu/~jspeaks/courses/2007-8/20229/_HANDOUTS/2-envelope.pdf)

Because Speaks views this puzzle as a puzzle about decision making, it’s tempting for him and similarly inclined philosophers to resolve it by appealing to or postulating some general principle for rational choice. He does this, postulating a principle he calls inference from an unknown:

“Suppose that you are choosing between two actions, act 1 and act 2. It is always rational to do act 2 if the following is the case: there is some truth about the situation which you do not know but which is such that, were you to come to know it, it would be rational for you to do act 2.”

This principle seems safe in many cases. For example, suppose you and I have gone to the Belmont Stakes, and you’re choosing which horse to bet on. I recommend a horse, saying: “there is some fact about Hoof-Hearted which is such that if you knew it then it’d be rational for you to bet on her.” Then, assuming I am perfectly reliable, Speaks’s principle requires that you bet on Hoof-Hearted.

But, as a general rule of inference, I think that some versions of the two-boxes/envelopes problem are counterexamples to Speaks’s principle. Speaks, however, doesn’t delve into why these versions are counterexamples, because he thinks those versions actually don’t involve an inference from an unknown.

To be specific, regarding what Speaks says about the choice open version and the choice open reverse version, he claims somewhere (here, at the bottom: https://www3.nd.edu/~jspeaks/courses/2009-10/20229/LECTURES/19-st-petersburg-2-envelope.pdf) that rejecting the principle of inference from the unknown still leaves us with the puzzle of how to process these two versions because (he contends) these versions don’t make use of the principle.

But I think they do make use of the principle, albeit subtly. Consider these two cases:

(3) Suppose I choose A and you then open B to reveal $20. I’m supposed to reason that my box A is worth [($20/2)/2+(2($20))/2] = $25 and so I should want to keep A. This is supposed to be puzzling for Speaks.

(4) Suppose the same scenario obtains as in (3), I choose A and you open B revealing the $20. But now I reason that my box, A, contains some amount, $x, and that, as a result, the other box, with $20, contains either $2x or $(x/2) with equal probability. So, although I know the “flat value” of box B, which is $20, I don’t know its relative value, i.e. I don’t know how it compares to the value of box A. So, by switching from box A to box B I get a 1/2 chance of doubling or a 1/2 chance of halving, so I should switch because doing this has an expected value of $(x + (x/4)) which is always more than $x for non-zero x. So, the fact that I happen to know that B has $20 need not prevent me from reasoning that switching to B as the better option.

Thus, given (3) and (4) it appears we can have both conclusions at once: switch to B and keep A… which does look like a problem.

So, what are we to do? Can we avoid the paralysing choke of paradox? I think we can, as long as we do not unrestrictedly accept Speaks’s principle of inference from an unknown. Again (and this is tentative, so bear with me) I’m inclined to see the problem in (3) and (4) as one of confusing a variable with a constant, or perhaps, or confusing variability with constancy. Sure, we know in the open version what the value of the open box/envelope is, e.g. $20. But this value is still variable in the sense that what is at issue for our decision is not just the box’s absolute value ($20) but its relative value. And when the other box is still closed, the relative value of $20 in this scenario is still an unknown. So, contrary to what Speaks says in his notes, the choice open version and the choice open reverse version, as with (3) and (4), are both subtly using the inference from an unknown principle: reasoning to the conclusion that one can choose so as to maximize expected utility from the fact that there are some facts such that if one knew them one could choose so as to maximize expected utility. At least, this is how it seems to me.

Since having these thoughts I’ve actually gone back to read an older paper that Kit Fine – the NYU-based philosopher – wrote on the nature of variables in 2003

(https://www.pdcnet.org//pdc/bvdb.nsf/purchase?openform&fp=jphil&id=jphil_2003_0100_0012_0605_0631).

Fine since published an entire monograph on the theory which he thinks solves the various puzzles to which variables apparently give rise (that’s his book, Semantic Relationism; Blackwell 2007). The chief puzzle goes back to Russell in his fantastic book Introduction to Mathematical Philosophy, and Fine calls it the ‘antinomy of the variable.’ Essentially, it is the puzzle that the semantic roles of distinct variables, e.g. ‘x’ and ‘y’ (or ‘A’ and ‘B’ as in the box/envelopes case), seem to be at once the same role and yet different roles. For example, here’s Fine in his 2007 book:

“Suppose that we have two variables, say “x” and “y”; and suppose that they range over the same domain of individuals, say the domain of all real numbers. Then it appears as if we wish to say contradictory things about their semantic role. For when we consider their semantic role in two distinct expressions – such as “x > 0” and “y > 0”, we wish to say that it is the same. Indeed, this would appear to be as clear a case as one could hope to have of merely “conventional” or “notational” difference; the difference is merely in the choice of the symbols and not at all in linguistic function. On the other hand, when we consider the semantic role of the variables in the same expression – such as in “x > y” – then it seems equally clear that it is different. Indeed, it would appear to be essential to the semantic role of the expression as a whole that it contains two distinct variables, not two occurrences of the same variable, and presumably this is because the roles of the distinct variables are not the same” (Fine 2007, p.7)

From a logician’s point of view this is actually quite a puzzle. In my mind, it relates to the puzzle of the two boxes/envelopes too. For Fine, variables have semantic/logical properties both individually, in virtue of being variables, and relationally, in virtue of their standing in semantic/logical relations to other variables. To clarify. ‘x’ denotes a variable which in an expression like ‘x > 0’ has precisely the same meaning over the reals as ‘y’ does in ‘y > 0’. Considered individually, then, ‘x’ and ‘y’ have the same semantic/logical role – i.e. that of variables which range unrestrictedly over the same set of values. Considered together however, as in ‘x > y’, they have distinct semantic/logical roles – i.e. as variables which, in a given expression, can take different values. Thus, they seem to be at once purely variable and yet, in some expressions, non-purely variable, i.e. differently constrained in their possible values. In other words, in a given expression, ‘x’ can have a different value to ‘y’ but,in no given expression can ‘x’ have a different value to ‘x’. Taken individually, they are purely variable, but taken collectively they are non-purely variable. Fine develops a whole semantic framework which allows him to provide a model theory for first order logic around the idea that variables can have semantic properties in virtue of their relations to other variables without this reducing to their purely individual (intrinsic) semantic properties. ‘Tis mighty stuff.

Now to apply Fine’s thoughts a bit: In the envelope case, (4) above, we are concerned with the constant term ‘$20’. This case, it seems to me, gives rise to the exact reverse of the phenomenon that arises in the antinomy of the variable. In the decision scenario of (4), the expression ‘$20’, when considered individually, is a constant denoting the utility amount of twenty dollars, that is, when taken individually it is purely non-variable. But, when the expression ‘$20’ occurs in our reasoning alongside a variable expression denoting the amount in box A (i.e. “$x”) then ‘$20’ is revealed as non-purely non-variable, i.e. as partly variable. Intrinsically, the semantic/logical role of ‘$20’ is that of a constant, denoting a fixed utility for the decision maker. But when set in relation to ‘$x’, the semantic/logical role of ‘$20’ is not purely that of a constant but of a variable, denoting a multiplicity of possible values (either half of what’s in A or double what’s in A). This is why I think it still makes sense to think of the choice open versions of the two envelopes problem as involving an inference from the unknown – and for anyone who thinks otherwise to be confusing constancy with variability.

So these puzzles really do constitute counterexamples to Speaks’s principle of inference from an unknown, in my view. This principle can lead us astray in exactly the way that the fallacy of ambiguity or the error of confusing the logical categories of variable and constant can lead us astray. Moreover, learning to avoid these confusions, to avoid confusing what is variable for what is constant and so avoid reasoning as though we know more than we do, is an essential part of what it takes for our lives to go well. If only Charles Swann had realized this in advance, the antinomy of the variable ‘Odette’ might not have tortured him so. We could all learn from this.

Now, finally, regarding whether mathematics can know us. First, I am inclined to think that the world instantiates mathematical structure (understatement!). That is, there is some sense in which the world is mathematical in nature. Now we, of course, are part of the world, thus we participate in its mathematical character, both in our apprehension of mathematics and in our physical and psychological instantiation of its structures. Our scientific understanding of the material world suggests that when certain mathematical structures are physically instantiated (whatever the hell that means), the result is a conscious entity capable of knowing this world and itself. It is, of course, a hoary metaphysical conundrum how the physical and the purely mathematical relate (the one is concrete, the other is abstract after all). But I am inclined to say that they do relate: the only conception of the world I possess is of a world which itself possess mathematical character – my world is one in which things can be counted, ordered, measured, and in which space instantiates some geometry or other.

In a non-joking way, I hold that our nature too is, in a sense, mathematical. Mathematics makes us possible, because it is required for there to be a world. But it is also an inextricable part of our mental lives: we think mathematically, unity and plurality are inescapable categories of thought for us. Mathematics is part of what we are, in the same way that our world is – constitutively – part of what we are (we are part of it, at least, in that our parts overlap). Mathematics permeates our world and us with it. On that basis, I am inclined to say that mathematics “knows us” at least to the extent that we know ourselves.

On the other hand, the sense of “knows” here might not be the usual one. Given the usual notion of “knowing” it may be that the only things that can properly be said to know are epistemic agents, like persons. For example, consider the case of the soul-searching neuroscientist who has spent a lifetime studying the neurological substrata of memory. Occasionally, he laments the fact that although he has spent long years of study and practice coming to know the functions of the hippocampus, the functions of the hippocampus will never know him in return.

This lament might seem bizarre. But we can all recognize the idea of someone who loves their work, even considers themselves “married” to it, but who is told by others: “oh but your work will never love you back!” Irksome prattle, yes, but there’s something to it. The functions of the hippocampus, conceived of abstractly as something instantiated by all normal brains of certain creatures, seem to be of the wrong logical category to be described as the subjects of knowledge states. The lament of not being known in return by abstracta thus looks like it is based on a simple category error, I.e. the error of confusing the logical category of an individual object, which can be the subject of knowledge attributions, with the logical category of a property or quality, which is what is attributed to individual objects (e.g. a brain function, or some mathematical structure).

Our thinking often goes amiss when we confuse the logical category or role of some expression or notion with that of some other expression or notion; witness the fallacy of treating a variable as a constant above! (Also, cf. Peter Hacker’s work on neuroscience and philosophy here! Whatever you think of Hacker’s views, similar methodological points are being made.)

To respond to the “category error” complaint, although we can say, with the likes of Hacker, Gilbert Ryle, and others, that the ordinary workaday concept of knowledge rules out as incoherent the thought that abstract or inanimate things can be the subjects of knowledge attributions, there is nevertheless some neighboring notion which we are trying to get at. This notion is not the same as that of ordinary knowledge, but it can serve a complimentary role in our relationship with things like mathematics. Mathematics “knows” us in this neighboring sense in the same way that the world “knew” what to do in order to spawn our ancestors three billion years ago, to bring us forth from its primordial viscera. It is the same sense in which a creature’s body “knows” what to do fight an infection, or to gestate unborn offspring.

The information required to change or bring something into existence is all there, coded up in the nature of the organism. Similarly, one could say that the information required to make our world and us who are in it is “all there”, eternally encoded in the realm of the abstract, Plato’s realm of the Forms. So, maybe the Forms know us in the same unconventional way that the body knows its defense mechanisms, and the solar system knows its orbital mechanics. Strange stuff, maybe, but mathematical Platonism is difficult to reject once you get right down to it and we must expect some sense in which the abstract comes into relation with the non-abstract if Platonist views are to be endorsed.

For some medieval scholastic philosophers, mathematical objects were perfect ideas or thoughts in the mind of the deity. For them, to be “known” by mathematics might simply imply being an object constructed from those ideas. Given medieval Europe’s definitions of ‘God’, however, I think this would be no more strange to be known by mathematics than to be known by God. And the world is a very strange place, after all.

Finally, in connection with the theme of memory, which is arguably the dominant theme in Proust’s titanic work, there’s a lot to be said. “Mental time travel” some have called it. It is something which can arouse feelings of nostalgia and a connection with the everlasting and unchanging past, whose presence in our minds somehow seems to give life a feeling of meaningfulness. I’m reminded of the following passage from Bertrand Russell’s essay ‘A Free Man’s Worship’:

“the reason why the Past has such magical power. The beauty of its motionless and silent pictures is like the enchanted purity of late autumn, when the leaves, though one breath would make them fall, still glow against the sky in golden glory. The Past does not change or strive; like Duncan, after life’s fitful fever it sleeps well; what was eager and grasping, what was petty and transitory, has faded away, the things that were beautiful and eternal shine out of it like stars in the night. Its beauty, to a soul not worthy of it, is unendurable; but to a soul which has conquered Fate it is the key of religion.”

It’d be cool to see a blog post more explicitly focused on the theme of memory in Proust – maybe in relation to the neuroscience of involuntary memory (Josh?). There’s definitely some connection between the altered mental states prompted by memory and the feeling of seeing the world anew which Proust suggests is a feeling that gives life meaning. I for one would like to think more about that.

]]>There’s not much neuroscience research on involuntary memory. One study suggests that both voluntary and involuntary memory utilize similar neural pathways, but that involuntary memory in particular witnesses decreased input from the prefrontal cortex. Thus involuntary memory proceeds without the orders of the part of the brain responsible for executive function. It’s almost as if the stimulus itself behaves as the executor. As you mentioned, the outside world may wield a surprising degree of agency.

Hall, N. M., Gjedde, A., & Kupers, R. (2008). Neural mechanisms of voluntary and involuntary recall: A PET study. Behavioural brain research, 186 (2), 261-272.

]]>Indeed, that’s the trouble with revenge: that no matter how decisively you might ultimately crush your tormenters, their torments will still remain. No revenge can undo the sorts of wrongs committed in The Musketeers.

Which is why, Christianity aside, I’m now wondering if Dumas was just a troubled, troubled guy. Whether the revenge is complete—or justified—or satisfying—or redemptive—or just—or Christian—or whatever—seems beside the point to me now. What really sticks out to me is the personal turmoil and twistedness that could/must have led Dumas to concoct such extreme revenge scenarios in the first place. Indeed, the philosophical details of these revenge acts seem irrelevant compared to the torments which originally made them necessary—torments which should perhaps themselves be considered the loci of philosophical intrigue.

]]>A naïve model could have it that the self, the seat of consciousness, inhabits the mind, which furnishes, in turn, additional faculties like memory, attention, and perception…This model too runs into problems.

However, your subsequent model, entailing multiple orders of the self, would seem to run into same problems. Just as attention, memory and perception are seamlessly integrated into the seat of consciousness, so would be lower orders of the self in your second model. Indeed, the trouble with disorders of purportedly sound and willful thoughts is precisely the fact that, although they may stem from lower selves, they feel like they’re sourced in higher selves, and so attempted treatment of these disorders may produce the same tension. Of course, you could just tell the patient the selves are lower order, and that may provide some reassurance, but it’s often the case that insight is hard to come by in the mentally ill.

Here’s another angle, though. You point to OCD as an example of a willful disorder, but sufferers of OCD often consider their obsessions and compulsions to be enormously intrusive. In fact, personality disorders (in particular clusters A and C) are probably more likely to be deemed by those who experience them to be sound and willful. As you point out, the same is probably true of trauma and PTSD. Meanwhile, Mental/Psychiatric/Behavioral/Learning disorders (formerly Axis 1 in the DSM) such as depression, anxiety and OCD, are probably more likely to be deemed unsound and non-willful. Axis 1 disorders are probably also closer to “lower selves”. This is evidenced by Schachter and Singer’s famous experiments, in which an injection of adrenaline recapitulates the symptoms of anxiety. So our conclusion might be that disorders which are both “willful” and “of the lower self” are relatively rare.

This could be good for us. If disorders are unwelcome, there’s no philosophical difficulty to begin with. The tough part, though, is that something like obsessive compulsive personality disorder (OCPD), an overwhelming tendency towards neatness and orderliness, which you suggest would be lower order, is probably actually higher order than you grant (hence *personality disorder*). As “lower self” disorders are often “non-willful”, so “higher self” disorders probably tend to be willful. Thus treatment of them, if even permitted by the patient, will require fundamental changes of the self, along with the associated difficulties you describe.

Beating Magnus isn’t something Max Deutsch could do after ten thousand hours. (The incidental resemblance of your quote to Dr. Ericsson’s work and Malcom Gladwell’s bundle of clichés on the matter doesn’t help clarify things.) Beating Magnus isn’t something *anyone on the planet*, for that matter, could do after ten thousand hours, as all of his lessers in the highest rungs of chess have surely already expended this many hours and indeed much more.

Beating Magnus in chess, in fact, would probably be impossible for Max even after ten thousand years. Indeed, if Max, and more cruelly Magnus, were forced to live in a cryogenic vacuum and play each other at chess eternally, I’d guess that millions of years could go by without a win by Max. It’s that bad.

Perhaps a better angle by which to drive home this point is to illustrate the sheer difficulty even of attaining much milder chess achievements. Having played my fair share of online chess, I can attest to the true, monumental difficulty of attaining even modest ratings—say 1800 in blitz.

In fact, I have an achievement to my name that would make Max proud. After hundreds of games on chess.com, about six months of concentrated work, and plenty of extraordinary frustration, I managed to surpass the 1000 mark in 5|5. Maybe Max would do better to start there.

]]>