Proust’s Envelopes

Monsieur Charles Swann is artistically inclined (but primarily as a collector), musically gifted (though sharpest as a critic), and “a particular friend of the Comte de Paris”. The appearance of a painting from his collection (on loan at the Corot) in the pamphlet for the Figaro serves—en fin de compte—as nothing more than an occasion for his abasement at the hands of the narrator’s jealous great-aunt. His artistic talents are squandered on the decoration of old society ladies’ drawing rooms. In his occasional spare moments, he tinkers with an ever-unfinished essay on Vermeer of Delft.

Odette de Crécy, on the other hand, arouses in him—at least at first—nothing more than feelings of indifference.

It’s no wonder, then, that what finally moves Swann’s heart—what sets in motion a helpless, protracted infatuation—is Swann’s sudden recognition, in Odette, of a likeness to a figure with ancient significance: Zipporah, Jethro’s daughter, as she appears in Botticelli’s The Youth of Moses.


Detail of Moses and Zipporah’s daughters, from Botticelli’s The Youth of Moses.

Swann’s newfound attachment to Odette quickly becomes a source of torment, as Odette’s reciprocation of his love falters. Where once Odette proclaimed, with delight, that “You know, you will never be like other people!”, she now sighs, “Ah! so you never will be like other people!”; where she once wondered, “I do wish I could find out what there is in that head of yours!”, she now exclaims in frustration, “Oh, I do wish I could change you; put some sense into that head of yours.”

The Verdurin family, once “far more intelligent, far more artistic, surely, than the people one knows” (in virtue of their having facilitated Swann and Odette’s courtship) become “the most perfect specimens of their disgusting class” (after forsaking Swann, and introducing Odette to a rival). While once Odette would reassure him, nightly, that “We shall meet, anyhow, to-morrow evening; there’s a supper-party at the Verdurins’,” she now pleads, “We sha’n’t be able to meet to-morrow evening; there’s a supper-party at the Verdurins’.”

Swann snoops outside Odette’s lighted window, captivated by its “mysterious golden juice” hours after she’d sent him away for the night pleading fatigue. He invokes the hospitality of an old friend with a country house, just to be near Odette on a weekend trip to which he was not invited, ultimately baffling his host as he spends each evening “inspect[ing] the dining-rooms of all the hotels in Compiègne”. Swann declines the Baron de Charlus’ offer of accompaniment to the Marquise de Saint-Euverte’s glitzy party, entreating the Baron, instead, to check on Odette as “she goes to see her old dressmaker”.

As his jealousy reaches a breaking point, Swann, in one telling scene, scrutinizes a sealed envelope with the responsibility of whose delivery Odette has entrusted him:

He lighted a candle, and held up close to its flame the envelope which he had not dared to open. At first he could distinguish nothing, but the envelope was thin, and by pressing it down on to the stiff card which it enclosed he was able, through the transparent paper, to read the concluding words… He took a firm hold of the card, which was sliding to and fro, the envelope being too large for it and then, by moving it with his finger and thumb, brought one line after another beneath the part of the envelope where the paper was not doubled, through which alone it was possible to read.

On this occasion, as on many, Swann is simply distraught: “he took off his spectacles, wiped the glasses, passed his hands over his eyes.”

Swann’s pursuit of Odette occupies the largest chapter of Swann’s Way, the first among the seven volumes comprising Marcel Proust’s enormous In Search of Lost Time. Though much of Swann’s Way chronicles the rich emotional memories of its narrator, this chapter, alone, drags, at great pain, through the details of Swann’s trials.

Perhaps a profound message is to be extracted from this trying chapter.

The following problem, communicated to me by a friend, appears to be part of the internet-math folklore.

Player 1 writes down any two distinct numbers on separate clips of paper. Player 2 randomly chooses one of these slips of paper and looks at the number. Player 2 must decide whether the number in his hand is the larger of the two numbers.

Game. You are playing a game with an adversary, consisting of the following steps. The adversary chooses two unequal numbers, writes them down on separate pieces of paper, and then places these into separate sealed envelopes. The adversary then flips a fair coin to determine which of the two envelopes to give you. After seeing the number in your envelope, you must decide whether the hidden number is larger than or smaller than the number in your hand. You win if you can do this with probability strictly higher than \frac{1}{2}.

Here is the winning strategy:

Strategy. Choose any function which takes values between 0 and 1, and which is forever increasing in value. (Mathematically, this means some function f \colon \mathbb{R} \rightarrow [0, 1] which is strictly order-preserving in the sense that whenever a < b, f(a) < f(b). For example, the “logistic” function f \colon x \mapsto \frac{e^x}{1 + e^x} satisfies this property.)

Now call the number you’re shown x. State that the hidden number is smaller than x larger probability f(x). Otherwise, state that it is larger.

Claim. The Strategy satisfies the requirement put forth in the Game.

Proof: Supposing that the adversary has already chosen his two numbers, written them down, and placed them into the envelopes, let’s call these two numbers s and l, for smaller and larger. Breaking down the likelihood of winning along the two possible outcomes of the coin flip, we see that:

\begin{array}{r l} P(\text{win}) &= P(\text{you got } l) \cdot P(\text{you said ``smaller''}) + P(\text{you got } s) \cdot P(\text{you said ``larger''}) \\[\bigskipamount] & = \frac{1}{2} \cdot f(l) + \frac{1}{2} (1 - f(s)) \\[\bigskipamount] & = \frac{1}{2} + \frac{f(l) - f(s)}{2} \\[\bigskipamount] & > \frac{1}{2},\end{array}

where in the final step we used the increasingness of f. ♦

Here’s where things get interesting. Consider the bogus claim below (an asterisk denotes that a claim or proof is not valid).

Claim*: The Strategy does not meet the requirement of the Game.

Proof*: Without loss of generality, let’s call the number you’ve been shown x. Breaking down the likelihood of winning as in the proof of the Claim, we have that:

\begin{array}{r l} P(\text{win}) &= P(x \text{ is larger}) \cdot P(\text{you said ``smaller''}) + P(x \text{ is smaller}) \cdot P(\text{you said ``larger''}) \\[\bigskipamount] &= \frac{1}{2} \cdot f(x) +\frac{1}{2} (1 - f(x)) \\[\bigskipamount] &= \frac{1}{2} + \frac{f(x) - f(x)}{2} \\[\bigskipamount] &= \frac{1}{2}.\end{array}

Yet the Game demands a likelihood of winning strictly above \frac{1}{2}. ∗

There are various ways of explaining what exactly went wrong in this bogus proof. Perhaps the most transparent is to point out that the symbol x here is being made to stand for either among two possible values, here—the higher and the lower of the adversary’s two numbers—depending on where exactly in the proof it turns up.

A better way of putting it is to point out that “the number you’ve been shown” hasn’t yet been defined at the time of the phrase’s utterance, or that, more accurately, it refers not yet to a number but rather to a random variable which can take either of two possible values with equal likelihood. (This is the same fallacy as that behind the related “Two envelopes paradox“.)

This game illuminates Swann’s quandary. Even in the face of complete uncertainty, it demonstrates, an advantage over random chance can be achieved, through a strategy which, after the resolution of the uncertain outcome, reacts accordingly.

On the other hand, by naming and making static the unknown variable before the resolution of the uncertainty, one precipitates an apparently unwinnable struggle against randomness. This move, of course, is faulty, and obscures that the facts of the matter change accordingly as the uncertain outcome is resolved.

Swann insists on understanding Odette as a fixed, static entity. In reality, he faces two: one who “would kiss him before the eyes of his coachman” and another who “in a towering rage, broke a vase”. In attempting to resolve an uncertainty—that of Odette’s heart—before its resolution’s appointed time, Swann recasts a surmountable game as doomed.

The solution, of course, is not to see, by candlelight, into the adversary’s envelopes. It’s to react properly once your envelope is revealed.

*                *                *

Early on in Swann’s Way, the reader is made aware of Swann’s having made “a most unsuitable marriage”, to a woman known to “dye her hair and redden her lips”. Even as the novel reaches back into Swann’s past, however, the identity of the future Mme. Swann—in other words, whether she’s in fact Odette de Crécy—is artfully concealed. This uncertainty is ultimately resolved.

Throughout the entire account of Swann’s love for Odette, meanwhile, one force remains constant: the magical role played by “the andante movement of Vinteuil’s sonata for the piano and violin” in his feelings. Having once heard the piece many years prior, and yet unaware of its name, when Swann, astounded, hears it once again at the Verdurins’, it quickly becomes “the national anthem of their love”.

Swann at one point utters a peculiar thought: that of being “agonised by the reflection, at the moment when [the piece] passed by him, so near and yet so infinitely remote, that, while it was addressed to their ears, it knew them not…”. I too, have lamented that, no matter how deeply I may come to know mathematics’ abstract terrain, it will never be aware of my presence.

Much later, however, Swann reverses course. “For he had no longer, as of old,” the narrator remarks, “the impression that Odette and he were not known to the little phrase. Had it not often been the witness of their joys?”

Does not this particular bit of mathematics—this random, and yet winnable, game—suggest likewise that, to the contrary, math can know one’s presence? Though random chance may yet unfold, it leaves small clues in the process, small differences around us, features which would be altered had chance unfolded differently. By responding wisely to these subtle differences, we unwittingly interact with the great mechanisms of chance, and gain an imperceptible advantage. Perhaps they too detect our presence in turn.


6 comments on “Proust’s Envelopes

  1. Josh says:

    In the neuroscience world, Proust is famous for popularizing the idea of “involuntary memory”, a term which he in fact coined. This is the phenomenon in which a certain stimulus–say, perhaps, a particular piece of music–triggers the spontaneous resurrection of memories long-forgotten, like listening to the piece as a child. A few nights ago, for instance, I bought a Peroni at a bar. Upon taking the first sip, I was suddenly sitting on a stone doorstep in a sun-drenched Florence alleyway.

    There’s not much neuroscience research on involuntary memory. One study suggests that both voluntary and involuntary memory utilize similar neural pathways, but that involuntary memory in particular witnesses decreased input from the prefrontal cortex. Thus involuntary memory proceeds without the orders of the part of the brain responsible for executive function. It’s almost as if the stimulus itself behaves as the executor. As you mentioned, the outside world may wield a surprising degree of agency.

    Hall, N. M., Gjedde, A., & Kupers, R. (2008). Neural mechanisms of voluntary and involuntary recall: A PET study. Behavioural brain research, 186 (2), 261-272.

  2. Ben says:

    See this paper I wrote, inspired by these ideas.

  3. Chamomile says:

    You get a lot of respect from me for writing these helpful arctlies.

  4. Richard says:

    This is thought provoking stuff, Ben. And fairly stirring at the end too. It’s a great pleasure to see art and mathematics interwoven like this – that is, in an appropriately understated way. Sometimes exercises in comparative thought can appear forced and implausible; having the air of an intellect abused for the sake of conspiratorial “spider-webs” of red threads. But here, I think you juxtapose ideas in a way that’s just right, stylistically speaking. For example, the mathematical parts are careful and limpid in a way that clearly respects the mathematical point being made (though it’s also accessible and well laid out), while the literary parts are textually specific and summarized quite gently into the form required for the analogy to be seen. It’s an enjoyable read.

    Regarding the content, I actually have a lot to say and I preemptively apologize for making heavy use of references to published work of others.

    The two envelopes problem really is a fascinating puzzle. Some philosophers have written on it too, as a puzzle for decision theory, and there are even those in both the philosophical and mathematical community who refer to a “two envelopes paradox”, rather than merely a “puzzle”. I’ve had occasion myself to think about it more carefully recently, in conversation with you and with a colleague here in the department.

    The more I’ve thought about it, the more the whole thing seems to involve, as you and others have already said in some way or another, some sort of fallacy of ambiguity, i.e. of illicitly treating an item in our reasoning (e.g. a term) as having multiple distinct logical roles, or meanings. A simple example is the famous “proof” that a ham sandwich is better than eternal happiness:

    i. Nothing is better than eternal happiness. (Premise)
    ii. A ham sandwich is better than nothing. (Premise)
    iii. The relation x is better than y is transitive. (Premise)


    iv. A ham sandwich is better than eternal happiness.

    The fallacy of ambiguity here is clear. In i ‘nothing’ is used as a quantifier (“there is no x such that x is better than eternal happiness”), but in ii ‘nothing’ is used as a term (“there is something, call it ‘nothing’, such that a ham sandwich is better than this thing”).

    In the case of the two envelopes problem, in which one reasons to the conclusion that one should switch, I think something similar is going on. Here’s my (fairly inchoate) analysis – in which I discuss a variant of the problem (I’ll write about boxes rather than envelopes in case it’ll be convenient to distinguish the puzzle I’ll discuss from the one you’ve discussed, but obviously boxes vs envelopes is only a notational difference):

    So, the following case involves fallacious reasoning.

    (1) There are two boxes A and B. One box contains twice as much money as the other. Selecting one box, say A, we should take ourselves as having $x. Since we have $x dollars, switching seems to imply that we get either $x/2 with 1/2 probability or $2x with 1/2 probability. Numerically, switching appears to have the expected utility of (1/2)($x/2) + (1/2)($2x) = $5x/4. Obviously $5x/4 > $x so it looks like we should switch right? (WRONG!)

    The fallacy here is much discussed. Reflecting on it a bit, to me it’s always seemed like, on the surface, the problem is that we reason as though the situation is not symmetrical in a way that it clearly is. We fix (or determine or stipulate or assume) the value of our selected box, A, as being $x, then we reason about the other box as being indeterminate in value (either half $x or twice $x, with equal probability). But we can apply the exact same reasoning to the other box, making our switching strategy illicitly dependent on which box we arbitrarily assign the value of $x to in the first place.

    The underlying logical fallacy at this point is something we might call “the fallacy of the unqualified (or unconditional) conclusion”. I don’t know whether this fallacy has a pre-existing formal title already, but this is what I call it. Basically, it’s when you reason from some assumption to some conclusion, but then infer the conclusion as holding in an unqualified way, i.e. outside the scope (in the proof theoretic sense) of the initial assumption. In terms of the material conditional, the fallacy is the following illegal (and visibly bonkers) inference:

    I. If P then Q


    II. P

    To elaborate, supposing the value of box A to be fixed (as $x), we can then reason about the value of box B in a way which makes it seem like switching to B is better, as in (1). But switching to B is better only in the context of the assumption that the value of A is fixed at $x. It is fallacious to conclude in an unqualified or unconditional way (outside of the scope of our antecedent assumption that the value of A is $x) that choosing B is the better option overall.

    The fallacy of the “unqualified conclusion” is a simple scope violation fallacy really, but in this context it incorporates a fallacy of ambiguity too, because it sort of leads us to confuse the variable ‘x’ with a constant or non-variable. Our unqualified conclusion that it is better to switch boxes subtly treats ‘x’ as though it were not a true variable which could just as denote the amount in the other box, but rather as though it were a variable under some particular assignment or particular class of assignments (i.e. those assignments which make the expected value of the other box a fixed function of x). This also looks like a case of what might be called a “loss of generality”. Maybe I’m abusing terminology a bit here, but the thought is that we are initially introducing a variable in and appropriate way because it has a certain generality. But we then conclude something that loses the generality which allowed us to introduce the variable in the first place.

    The correct reasoning in the box case is visible when we don’t pick an arbitrary box first but rather think in fully general terms as follows:

    (2) We’ve selected a box with either $x or $2x and the combined amount in both boxes is $3x. The gain in switching boxes is determined by noting that we will either gain an additional $x with probability 1/2 (if we initially chose the box with $x) or we will lose $x with probability 1/2 (if we initially chose the box with $2x). In other words, the expected utility of switching is:

    EU(Switch) = 1/2(+ $x) + 1/2 (- $x) =(1/2x) – (1/2x) = 0

    So we should actually be indifferent to switching.

    The situation is the *opposite* to the situation in the famous Monty Hall problem, and it’s intuitively clear why it’s the opposite. In the Monty Hall problem, switching doors is only not a good idea if one is as likely (or more likely) to have initially selected the winning door as to have selected a non-winning door. But, subjectively, since there are three doors in the MH problem, one is not as likely (or more likely) to have selected the winning door as one is to have selected the non-winning door. So it’s not the case that switching doors is not a good idea, i.e. switching doors is a good idea.

    Contrariwise, in the box switching problem, there are only two boxes, only two options, and so, subjectively, one is as likely to have chosen the best box as not at the outset. This is why switching is not a preferable option for the two boxes, there is no probabilistically significant difference between the two options; switching is just as good as keeping.

    The philosopher of language, Jeff Speaks, at Notre Dame has done some interesting analysis of this too – though I actually think his comments go slightly awry at a crucial point. He considers multiple (exactly seven) versions of the problem which vary on different parameters (e.g. whether you open your envelope or whether your opponent opens theirs, the means of randomization etc.).

    (Here’s a link to some of his notes:

    Because Speaks views this puzzle as a puzzle about decision making, it’s tempting for him and similarly inclined philosophers to resolve it by appealing to or postulating some general principle for rational choice. He does this, postulating a principle he calls inference from an unknown:

    “Suppose that you are choosing between two actions, act 1 and act 2. It is always rational to do act 2 if the following is the case: there is some truth about the situation which you do not know but which is such that, were you to come to know it, it would be rational for you to do act 2.”

    This principle seems safe in many cases. For example, suppose you and I have gone to the Belmont Stakes, and you’re choosing which horse to bet on. I recommend a horse, saying: “there is some fact about Hoof-Hearted which is such that if you knew it then it’d be rational for you to bet on her.” Then, assuming I am perfectly reliable, Speaks’s principle requires that you bet on Hoof-Hearted.

    But, as a general rule of inference, I think that some versions of the two-boxes/envelopes problem are counterexamples to Speaks’s principle. Speaks, however, doesn’t delve into why these versions are counterexamples, because he thinks those versions actually don’t involve an inference from an unknown.

    To be specific, regarding what Speaks says about the choice open version and the choice open reverse version, he claims somewhere (here, at the bottom: that rejecting the principle of inference from the unknown still leaves us with the puzzle of how to process these two versions because (he contends) these versions don’t make use of the principle.

    But I think they do make use of the principle, albeit subtly. Consider these two cases:

    (3) Suppose I choose A and you then open B to reveal $20. I’m supposed to reason that my box A is worth [($20/2)/2+(2($20))/2] = $25 and so I should want to keep A. This is supposed to be puzzling for Speaks.

    (4) Suppose the same scenario obtains as in (3), I choose A and you open B revealing the $20. But now I reason that my box, A, contains some amount, $x, and that, as a result, the other box, with $20, contains either $2x or $(x/2) with equal probability. So, although I know the “flat value” of box B, which is $20, I don’t know its relative value, i.e. I don’t know how it compares to the value of box A. So, by switching from box A to box B I get a 1/2 chance of doubling or a 1/2 chance of halving, so I should switch because doing this has an expected value of $(x + (x/4)) which is always more than $x for non-zero x. So, the fact that I happen to know that B has $20 need not prevent me from reasoning that switching to B as the better option.

    Thus, given (3) and (4) it appears we can have both conclusions at once: switch to B and keep A… which does look like a problem.

    So, what are we to do? Can we avoid the paralysing choke of paradox? I think we can, as long as we do not unrestrictedly accept Speaks’s principle of inference from an unknown. Again (and this is tentative, so bear with me) I’m inclined to see the problem in (3) and (4) as one of confusing a variable with a constant, or perhaps, or confusing variability with constancy. Sure, we know in the open version what the value of the open box/envelope is, e.g. $20. But this value is still variable in the sense that what is at issue for our decision is not just the box’s absolute value ($20) but its relative value. And when the other box is still closed, the relative value of $20 in this scenario is still an unknown. So, contrary to what Speaks says in his notes, the choice open version and the choice open reverse version, as with (3) and (4), are both subtly using the inference from an unknown principle: reasoning to the conclusion that one can choose so as to maximize expected utility from the fact that there are some facts such that if one knew them one could choose so as to maximize expected utility. At least, this is how it seems to me.

    Since having these thoughts I’ve actually gone back to read an older paper that Kit Fine – the NYU-based philosopher – wrote on the nature of variables in 2003


    Fine since published an entire monograph on the theory which he thinks solves the various puzzles to which variables apparently give rise (that’s his book, Semantic Relationism; Blackwell 2007). The chief puzzle goes back to Russell in his fantastic book Introduction to Mathematical Philosophy, and Fine calls it the ‘antinomy of the variable.’ Essentially, it is the puzzle that the semantic roles of distinct variables, e.g. ‘x’ and ‘y’ (or ‘A’ and ‘B’ as in the box/envelopes case), seem to be at once the same role and yet different roles. For example, here’s Fine in his 2007 book:

    “Suppose that we have two variables, say “x” and “y”; and suppose that they range over the same domain of individuals, say the domain of all real numbers. Then it appears as if we wish to say contradictory things about their semantic role. For when we consider their semantic role in two distinct expressions – such as “x > 0” and “y > 0”, we wish to say that it is the same. Indeed, this would appear to be as clear a case as one could hope to have of merely “conventional” or “notational” difference; the difference is merely in the choice of the symbols and not at all in linguistic function. On the other hand, when we consider the semantic role of the variables in the same expression – such as in “x > y” – then it seems equally clear that it is different. Indeed, it would appear to be essential to the semantic role of the expression as a whole that it contains two distinct variables, not two occurrences of the same variable, and presumably this is because the roles of the distinct variables are not the same” (Fine 2007, p.7)

    From a logician’s point of view this is actually quite a puzzle. In my mind, it relates to the puzzle of the two boxes/envelopes too. For Fine, variables have semantic/logical properties both individually, in virtue of being variables, and relationally, in virtue of their standing in semantic/logical relations to other variables. To clarify. ‘x’ denotes a variable which in an expression like ‘x > 0’ has precisely the same meaning over the reals as ‘y’ does in ‘y > 0’. Considered individually, then, ‘x’ and ‘y’ have the same semantic/logical role – i.e. that of variables which range unrestrictedly over the same set of values. Considered together however, as in ‘x > y’, they have distinct semantic/logical roles – i.e. as variables which, in a given expression, can take different values. Thus, they seem to be at once purely variable and yet, in some expressions, non-purely variable, i.e. differently constrained in their possible values. In other words, in a given expression, ‘x’ can have a different value to ‘y’ but,in no given expression can ‘x’ have a different value to ‘x’. Taken individually, they are purely variable, but taken collectively they are non-purely variable. Fine develops a whole semantic framework which allows him to provide a model theory for first order logic around the idea that variables can have semantic properties in virtue of their relations to other variables without this reducing to their purely individual (intrinsic) semantic properties. ‘Tis mighty stuff.

    Now to apply Fine’s thoughts a bit: In the envelope case, (4) above, we are concerned with the constant term ‘$20’. This case, it seems to me, gives rise to the exact reverse of the phenomenon that arises in the antinomy of the variable. In the decision scenario of (4), the expression ‘$20’, when considered individually, is a constant denoting the utility amount of twenty dollars, that is, when taken individually it is purely non-variable. But, when the expression ‘$20’ occurs in our reasoning alongside a variable expression denoting the amount in box A (i.e. “$x”) then ‘$20’ is revealed as non-purely non-variable, i.e. as partly variable. Intrinsically, the semantic/logical role of ‘$20’ is that of a constant, denoting a fixed utility for the decision maker. But when set in relation to ‘$x’, the semantic/logical role of ‘$20’ is not purely that of a constant but of a variable, denoting a multiplicity of possible values (either half of what’s in A or double what’s in A). This is why I think it still makes sense to think of the choice open versions of the two envelopes problem as involving an inference from the unknown – and for anyone who thinks otherwise to be confusing constancy with variability.

    So these puzzles really do constitute counterexamples to Speaks’s principle of inference from an unknown, in my view. This principle can lead us astray in exactly the way that the fallacy of ambiguity or the error of confusing the logical categories of variable and constant can lead us astray. Moreover, learning to avoid these confusions, to avoid confusing what is variable for what is constant and so avoid reasoning as though we know more than we do, is an essential part of what it takes for our lives to go well. If only Charles Swann had realized this in advance, the antinomy of the variable ‘Odette’ might not have tortured him so. We could all learn from this.

    Now, finally, regarding whether mathematics can know us. First, I am inclined to think that the world instantiates mathematical structure (understatement!). That is, there is some sense in which the world is mathematical in nature. Now we, of course, are part of the world, thus we participate in its mathematical character, both in our apprehension of mathematics and in our physical and psychological instantiation of its structures. Our scientific understanding of the material world suggests that when certain mathematical structures are physically instantiated (whatever the hell that means), the result is a conscious entity capable of knowing this world and itself. It is, of course, a hoary metaphysical conundrum how the physical and the purely mathematical relate (the one is concrete, the other is abstract after all). But I am inclined to say that they do relate: the only conception of the world I possess is of a world which itself possess mathematical character – my world is one in which things can be counted, ordered, measured, and in which space instantiates some geometry or other.

    In a non-joking way, I hold that our nature too is, in a sense, mathematical. Mathematics makes us possible, because it is required for there to be a world. But it is also an inextricable part of our mental lives: we think mathematically, unity and plurality are inescapable categories of thought for us. Mathematics is part of what we are, in the same way that our world is – constitutively – part of what we are (we are part of it, at least, in that our parts overlap). Mathematics permeates our world and us with it. On that basis, I am inclined to say that mathematics “knows us” at least to the extent that we know ourselves.

    On the other hand, the sense of “knows” here might not be the usual one. Given the usual notion of “knowing” it may be that the only things that can properly be said to know are epistemic agents, like persons. For example, consider the case of the soul-searching neuroscientist who has spent a lifetime studying the neurological substrata of memory. Occasionally, he laments the fact that although he has spent long years of study and practice coming to know the functions of the hippocampus, the functions of the hippocampus will never know him in return.

    This lament might seem bizarre. But we can all recognize the idea of someone who loves their work, even considers themselves “married” to it, but who is told by others: “oh but your work will never love you back!” Irksome prattle, yes, but there’s something to it. The functions of the hippocampus, conceived of abstractly as something instantiated by all normal brains of certain creatures, seem to be of the wrong logical category to be described as the subjects of knowledge states. The lament of not being known in return by abstracta thus looks like it is based on a simple category error, I.e. the error of confusing the logical category of an individual object, which can be the subject of knowledge attributions, with the logical category of a property or quality, which is what is attributed to individual objects (e.g. a brain function, or some mathematical structure).

    Our thinking often goes amiss when we confuse the logical category or role of some expression or notion with that of some other expression or notion; witness the fallacy of treating a variable as a constant above! (Also, cf. Peter Hacker’s work on neuroscience and philosophy here! Whatever you think of Hacker’s views, similar methodological points are being made.)

    To respond to the “category error” complaint, although we can say, with the likes of Hacker, Gilbert Ryle, and others, that the ordinary workaday concept of knowledge rules out as incoherent the thought that abstract or inanimate things can be the subjects of knowledge attributions, there is nevertheless some neighboring notion which we are trying to get at. This notion is not the same as that of ordinary knowledge, but it can serve a complimentary role in our relationship with things like mathematics. Mathematics “knows” us in this neighboring sense in the same way that the world “knew” what to do in order to spawn our ancestors three billion years ago, to bring us forth from its primordial viscera. It is the same sense in which a creature’s body “knows” what to do fight an infection, or to gestate unborn offspring.

    The information required to change or bring something into existence is all there, coded up in the nature of the organism. Similarly, one could say that the information required to make our world and us who are in it is “all there”, eternally encoded in the realm of the abstract, Plato’s realm of the Forms. So, maybe the Forms know us in the same unconventional way that the body knows its defense mechanisms, and the solar system knows its orbital mechanics. Strange stuff, maybe, but mathematical Platonism is difficult to reject once you get right down to it and we must expect some sense in which the abstract comes into relation with the non-abstract if Platonist views are to be endorsed.

    For some medieval scholastic philosophers, mathematical objects were perfect ideas or thoughts in the mind of the deity. For them, to be “known” by mathematics might simply imply being an object constructed from those ideas. Given medieval Europe’s definitions of ‘God’, however, I think this would be no more strange to be known by mathematics than to be known by God. And the world is a very strange place, after all.

    Finally, in connection with the theme of memory, which is arguably the dominant theme in Proust’s titanic work, there’s a lot to be said. “Mental time travel” some have called it. It is something which can arouse feelings of nostalgia and a connection with the everlasting and unchanging past, whose presence in our minds somehow seems to give life a feeling of meaningfulness. I’m reminded of the following passage from Bertrand Russell’s essay ‘A Free Man’s Worship’:

    “the reason why the Past has such magical power. The beauty of its motionless and silent pictures is like the enchanted purity of late autumn, when the leaves, though one breath would make them fall, still glow against the sky in golden glory. The Past does not change or strive; like Duncan, after life’s fitful fever it sleeps well; what was eager and grasping, what was petty and transitory, has faded away, the things that were beautiful and eternal shine out of it like stars in the night. Its beauty, to a soul not worthy of it, is unendurable; but to a soul which has conquered Fate it is the key of religion.”

    It’d be cool to see a blog post more explicitly focused on the theme of memory in Proust – maybe in relation to the neuroscience of involuntary memory (Josh?). There’s definitely some connection between the altered mental states prompted by memory and the feeling of seeing the world anew which Proust suggests is a feeling that gives life meaning. I for one would like to think more about that.

    • Ben says:

      I’ll try to address the broader points you make in a forthcoming comment. For now, I’ll take this bit by bit, and leave a few remarks (around the edges—nothing serious) about your treatment of the “two boxes” paradox.

      I like your approach via this idea of “loss of generality”. Indeed, it’s certainly conceivable that a fallacious proof could proceed along lines like those you suggest; namely, it could prove something under a special assumption and then illicitly claim that an analogous result holds even in the absence of that assumption. In such a case, the flaw would reside in the claim that some particular assumption can be made “without loss of generality”‘s being false; one imagines, to the contrary, that this particular assumption does restrict generality—in, moreover, a material way, that is, one for which the reduction of the general case to the special case is not obvious (or true).

      Indeed, what it means to say that some assumption can be made “without loss of generality” is that the truth of some particular statement even in the absence of this assumption can be obviously, or easily, deduced from the truth of the corresponding statement in the presence of the assumption.”This assumption will therefore be made going forward,” the prover states. “Deduce the general result from the ensuing particular one yourself.”

      As a side note, I dislike the use of “without loss of generality” (or “we may assume” or any of its other variants) in proofs; I personally avoid it, and consider it a sort of bad practice. If you’re going to impose an assumption, then explain, or gesture towards, how exactly the unconditional result is to be deduced from the conditional one. Often these reductions hide a lot of complexity, and “without loss of generality” is sometimes used as a crutch.

      Here’s an example from the field which does feature ample justification, drawn from my favorite reference, the Stacks Project:

      “Formation of normalization commutes with étale localization by More on Morphisms, Lemma 36.17.3. Checking that local rings are regular, or that a morphism is unramified, or that a morphism is a local complete intersection or that a morphism is unramified and has a conormal sheaf which is locally free of a given rank, may be done étale locally (see More on Algebra, Lemma 15.41.3, Descent, Lemma 34.20.28, More on Morphisms, Lemma 36.52.13 and Descent, Lemma 34.7.6).

      By the remark of the preceding paragraph and the definition of normal crossings divisor it suffices to prove that a strict normal crossings divisor D = \cup_{i \in I} D_i satisfies (2).”

      This paragraph, in other words, shows that in proving the implication (1) –> (2), one may assume without loss of (material) generality that D is in fact a strict normal crossings divisor (as opposed to a general one).

      This is all well and good, but I’ve oversimplified things a bit. The refrain “without loss of generality”—and this is partly why I dislike it so much—can actually mean a few subtler things, having to do with naming. It can often mean something like “the naming scheme we are about to adopt does not violate any of the constraints it is subject to”. I myself am guilty of using this phrase (in this second sense) in, fittingly, the paper I wrote on this subject:

      “Writing without loss of generality the adversary’s (real) numbers as x_1 > \cdots > x_n, we denote by…"

      This statement does two things. For one, it gives names (namely x_1, \ldots , x_n) to the adversary's previously unnamed n numbers; in addition, it “assumes” that these numbers are indexed in descending order. This, of course, is not an assumption in the sense in which that D is a strict, as opposed to a general, normal crossing divisor is; to the contrary, it is merely a stipulation that the aforementioned naming will be carried out in a particular way. The significant thing is that this particular way (namely, that under which the numbers are sorted) doesn’t violate any constraints imposed by previously accepted assumptions. Of course it doesn’t, since these numbers haven’t even been named yet.

      An example in which an issue would arise is the following:

      *”An adversary picks n numbers in order, say, x_1, \ldots , x_n. Without loss of generality, assume that x_1 > \cdots > x_n."

      This of course is a serious problem, as the "assumption" that the numbers are named so as to be sorted contradicts a previously imposed constraint, namely that according to which they're also named in the order in which the adversary picked them.

      A more subtle example of the use of "we may assume" to usher in a "contradiction-free renaming" is visible further down in my paper:

      "Each set \{x_{\sigma^{-1}(1)}, \ldots , x_{\sigma^{-1}(m - 1)}\} in the above inequality differs from \{x_{\sigma^{-1}(1)}, \ldots , x_{\sigma^{-1}(m)}\} only in its lacking some particular element of the latter set (determined by the element d_m). This situation thus mimics that of expression (2), and for notational convenience (that is, up to a reindexing), we may adopt its setting in what follows."

      I'll myself admit that I was a bit light on the details here. The content of the claim, though, is that the general case (corresponding to an arbitrary trailing code) differs merely by a constraint-satisfying renaming from the special case (corresponding to no trailing code).

      Another example might be:

      "Suppose that \{\mathbf{v}_1, \ldots, \mathbf{v}_n\} are linearly independent elements of an n-dimensional vector space V over \mathbb{R}. Without loss of generality, assume that V = \mathbb{R}^n and that \mathbf{v}_i is the standard basis vector \mathbf{e}_i for i = 1, \ldots , n.”

      The “renaming” in this case is actually the vector space isomorphism \mathbb{R}^n \rightarrow V induced by sending \mathbf{e}_i \mapsto \mathbf{v}_i, whose injectivity (and surjectivity, by a dimension count) is guaranteed by the linear independence of the \mathbf{v}_i. Hence the subtlety of “without loss of generality”: if this linear independence had not been assumed, then the “assumption” would have been fallacious, a loss of generality. (That is, it would have introduced the assumption of linear independence, which in this case is equivalent to this vector space map’s being an isomorphism.)

      A very common situation “in the wild” in which this occurs is that in which a point is translated to the origin. For example: “Assume without loss of generality that P = 0 (this simplifies the notation considerably, but does not change the mathematics)” (Greene and Krantz 2006, p. 81). For a much more sophisticated case, see Schmid: “Thus, localizing the problem, one arrives at the following situation: the period mapping is defined on a polycylinder, from which some coordinate hyperplanes have been removed; in other words, on a product of punctured discs and discs.”

      “Assume that P = 0?” you might say. “That’s absurd! …P does not equal 0!” Well, sure, it doesn’t. The point though is that the underlying set (say the complex plane) can be renamed (that is, its coordinatization translated) in such a way that P becomes 0 when all is said and done, and no prior constraints are violated.

      As if things weren’t bad enough, I’ll point out (what I think is) a still further sense in which “we may assume” is used. This is an inductive one. Vaguely, it goes like: “[Base case] is true. Thus, we may assume [case n] is true, provided that we prove the inductive step [case n implies case n + 1].” In fact, this sense is quite different, and it may just be a coincidence that the same words are used. (Really, it boils down to “it suffices to prove the implication case n –> case n + 1″.) In any case, this too appears in my paper:

      “By induction, therefore (where the base case m = 3 is exactly the classical Proposition 1, in light of Remark 2) we may assume that, for each trailing code (d_{m + 1}, \ldots , d_n), m \in \{3, \ldots , n - 1\}, each instance d_m of the inner expression above features an inequality: [], leaving unproven, only the inductive step [], where, …”

      In sum, I’ll conclude that these senses all appear at least vaguely related, in the sense that they constitute reductions or simplifications from which the general result can nonetheless be deduced. Yet the first sense is a substantive (semantic?) reduction, and the second is a nominal (syntactic?) simplification, merely a renaming, which replaces one situation by an identical one in which things are named so as to better suit our purposes. The third is perhaps like a sequence of applications of the second: one for each inductive step in a complex chain, or rather hierarchy, of deductions, which all, in virtue of the inductive formulation of the problem, share an identical, or analogous, structure, albeit admitting various instantiations thereof.

      With all of this said, I’m not quite convinced that the fallacious switching argument in the “two boxes” paradox actually constitutes a spurious loss of generality of this sort. A loss of generality of the sort I (and, it seems like, you) have described occurs, I take it, when a material restriction, or strengthening, of a set of assumptions is substituted illicitly for the general (or weaker) set. And yet the assumption “Box A has $x in it”, and therefore that “Box B has either $2x or $x/2” doesn’t, as I see it, constitute a strict subset of the set of situations that could actually arise. It’s completely separate from this set. Nowhere in the proper understanding of the box paragraph does an asymmetry of this kind arise; the kind of asymmetry that does arise is that under which Box A has $x and Box B has $2x (for some fixed x). Yet were we to substitute this situation (which will occur half the time) for the general case, then the expected proceeds after switching would be exactly $2x, not $5/4x.

      So yes, I think the box paragraph can only be explained by an “antinomy of the variable” approach, and not as a spurious loss of generality. That’s how it seems to me.

    • Ben says:

      Bear with me here, as I try to work through my understanding of the relevance of Fine’s problem to this discussion.

      I’ll come right out and say that I’m feeling “deflationary” about this. The question, as I understand it, is how it comes to be that y can be freely substituted for x in “y > 0″ without meaningfully changing the expression, but not into “x > y“.

      I think this again comes down to which sorts of substitutions are harmless renamings of the underlying set and which violate previously established constraints. I see expressions like “x > 0″, “y > 0″, and “x > y” as boolean-valued expressions of real variables. That is, the first two define functions \mathbb{R} \rightarrow \{0, 1\}, while the third defines a function \mathbb{R}^2 \rightarrow \{0, 1\}. Each of these such functions assigns to each point in the parameter space the truth value of the point’s real coordinates under the function’s associated boolean expression. Of course, each such function also naturally defines a subset of the domain \mathbb{R}^n: the preimage of 1, or, in other words, the subset of the domain consisting of points whose coordinates evaluate to true.

      It might help to back up a bit here and talk about what we actually mean when we coordinatize. Taking for example the expression “x > 0″, the corresponding function \mathbb{R} \rightarrow \{0, 1\} should be viewed as a map of the underlying sets. The same is so for functions like \mathbb{R} \rightarrow \mathbb{R}, x \mapsto x^2.

      The role played by “x” in defining functions like these is actually somewhat tricky. It helps to view x separately from the underlying map of sets. There are many (many) maps \mathbb{R} \rightarrow \{0, 1\}, and “x > 0″ singles out just one of them. The fact that a coordinate is used is immaterial—at least for now (but see below). x, in other words, is merely a “handle” that allows us to latch onto an underlying element of the set \mathbb{R}, and “describe where we want to send it, through a simple rule”. In the former of the above two examples, this rule exploits \mathbb{R}‘s (total) order structure, while in the latter, it exploits \mathbb{R}‘s field (actually ring) structure. Again, though, a general map of \mathbb{R} into some set need not take any form expressible conveniently through an expression of \mathbb{R}‘s canonical coordinate x.

      The key, here, is that insofar as (any point of) \mathbb{R}^2 has two natural underlying coordinates, it should be no surprise that problems happen when we use a name already reserved for one to refer to the other. Viewing \mathbb{R}^2 as the cartesian (pardon the pun?) product \mathbb{R} \times \mathbb{R}, the underlying constituents of any point are “its first coordinate” and “its second coordinate”. To call the canonical coordinate of a generic point of \mathbb{R} y as opposed to x is a harmless renaming of the same underlying object. But provided that one of \mathbb{R}^2‘s coordinates has already been called “x“, we shouldn’t call the second one “x” too. Beyond its being unclear in “x > x” which of the two of \mathbb{R}^2‘s coordinates “x” actually refers to, we must assume, at the very least, that it refers to the same one—at which point the discrepancy between this expression and “x > y” becomes clear.

      The viewpoint one ought to take with respect to coordinatization is perhaps made more evident by the theory of smooth manifolds. Such a thing is an underlying topological space which, at least locally (that is, on a sufficiently small neighborhood around any given point) can be (homeomorphically) represented as an open domain in \mathbb{R}^n (this representation is called a (local) chart). This so far has just been a statement about local topological structure, but more is true: we only consider local coordinatization schemes which are mutually compatible in the sense that each pair of local coordinate charts which happens to overlap overlaps in a smooth way, so that these two charts, viewed as maps into \mathbb{R}^n, differ from each other by a smooth (infinitely differentiable) map from \mathbb{R}^n to itself.

      The first thing to realize here is that any function f from a manifold M into \mathbb{R} can be viewed, at least locally around any fixed point x \in M, as a map from some open domain U \subset \mathbb{R}^n into \mathbb{R}, via a chart around x (whose existence is guaranteed by the axioms). The key consequence of compatibility of charts, though, is that if this function f happens to be smooth (as a function U \rightarrow \mathbb{R}) in one such identification, then it will be smooth in any such, and that, as a consequence, we can talk about the smoothness or lack thereof of functions M \rightarrow \mathbb{R} without talking about coordinates at all. Many of the great achievements in Riemannian geometry (e.g., general relativity) center around this “coordinate-free” approach.

      All of this is just to say that concentrating on the maps of underlying sets (with additional, e.g. smooth, structure) is the order of the day, and that coordinates, which should be used only when necessary (e.g. when performing local calculations), tend only to bring trouble.

      This is immediately relevant to Kit Fine’s example, I contend. Insofar as we understand the underlying maps of sets suggested by the expressions “x > 0″, “y > 0″, and “x > y“, the relevant issues seem to arise around the use of the symbols “x” and “y” to refer to coordinates of these particular sets.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s