*This article is part of a series on The Structure of Theorems. See also:
*1. Theorems’ Almanack; 2. The Greatest Theorem; 3. Game of Theorems

Theorems are the hallmark of the mathematical trade. In them reside mathematical rigor, mathematics’ unspoken holy grail. They also possess a kind of mystique. “I’m scared of upper-division math classes,” I once overheard a student say. “It’s all proofs.”

Theorems, independent of their mathematical content, are interesting in their own right, because they delineate the mathematical landscape. Open a textbook of mathematics, and you’ll find the difficult universe of mathematics, once uncharted and unknown, arrayed and packaged before your eyes. “**Theorem: ___________.** *Proof: _____________.*”

I attempt here to understand mathematical theorems, in a purely logical sense: for what they are, for what they do, and for what role they play.

**What are mathematical statements?**

A *mathematical statement*, roughly speaking, will be understood as that which is asserted by a theorem. We’ll have to develop a more precise definition for these statements before we can begin to understand the theorems which address them.

Our first, smallest, base-level ingredients will be mathematical *propositions*. I’ll understand propositions, here, as a mathematical statement’s atomic, irreducible parts. They can be either true or false. These mini-statements will assert simple facts — for example, that we’re considering an object of some certain type, or that the object we’re investigating satisfies some particular property. I’ve purposely left this definition somewhat murky. Things should become clearer in the course of time. But a discussion of the exact nature of these propositions, or of whether they can, in some sense, be said to be “purely logical” (a position associated with a philosophy of mathematics called *Logicism*) will have to be postponed to another day (or to the comments section).

Within some collection of propositions, there may exist dependencies. That is, the values of certain propositions within a collection will depend only on whether we presuppose them or not, while the values of other propositions within this collection will depend on the presupposed values of these initial propositions. Dependencies might even work in both directions!

We typically wish to investigate the precise nature of the dependencies between propositions. More concretely, if, for example, the truth value of some proposition *P* depends on the values of two other propositions *A* and *B*, we’d like to investigate *P*’s value under all possible combinations of *A* and *B*’s truth values. (“*A* and *B* are both true. What about *P*? *A* is true but *B* is not true. What about *P*?” And so on.)

More generally, given a collection of propositions, we may select a subset —* which contains no internal dependencies* — and designate this subset’s members as *independent propositions*. Presupposing in turn each possible combination of truth values over this set of independent propositions, we then may begin to fill out the truth values for those remaining.

We move from the independent towards the dependent propositions through good, old-fashioned mathematical reasoning. This takes work! This progress will often take the form of mathematical constructions. “Because *A* is true, we can construct this particular object, and because *B* is true, we can use *A*’s construction to assert the further statement *P*,” and so on. I’ll note that whether these mathematical processes — which take us from one proposition to another — can *themselves *be said to be purely logical is also a difficult question. Though this question, too, must be left for another day, I’m tempted to conclude that they can. If one proposition implies another, it’s either in virtue of pure logic, or in virtue of additional mathematical propositions. Incorporating these additional propositions into our collection, however, the implication becomes one of pure logic. Or so it would seem.

Mathematical statements can be understood as assertions about the completed truth tables of collections of independent and dependent propositions. One of these statements could, presupposing each of the many possible combinations of truth values for its independent propositions, assert corresponding truth values for each of its dependent propositions. More typically, a statement simply takes a “slice” of the possible combinations of truth values for its independent propositions and asserts the corresponding truth values for its dependent propositions. The rest are disregarded. (“If *A* is true and *B* are true, then P is true. Otherwise, we don’t care.)

Truth tables can help us understand what mathematical statements are. They’re not very good ways of organizing mathematical ideas, though. For one, they’re large and messy, and contain too much information. These tables must contain a “row” for each of the possible combinations of truth values for their independent propositions, and “filling out” one of these tables could require a number of mathematical assertions on the order of two raised to the power of the number of independent propositions! This is too many, and we frequently care only about a small subset of these possible assertions. Equally importantly, truth tables reveal little about the organizational structure of mathematical statements. Though they do distinguish between independent and dependent propositions, there’s much more to the story. Some independent propositions are mathematically more basic, trivial, or fundamental than others. Some dependent propositions, on the other hand, might depend on others, and others may depend on these, in a complex, hierarchical fashion. We’d like a way of organizing mathematical statements which tells us more about these statements’ intrinsic structure.

**Moving from truth tables to organizational structure**

First, we should fix a particular dependent proposition, say *P*. If our statement asserts not only *P*’s truth under certain conditions, but also its falsehood under others, we would create a new proposition *Q = not P* and perform the below procedure to establish *Q*’s truth in the same manner. Further distinct dependent propositions *R* must also be extracted and proven separately.

Given a statement — involving our single dependent proposition *P* and some independent propositions *A*, *B*, and *C*, say — we might try scanning the statement’s truth table only for rows in which this dependent proposition does hold. When we find a combination of truth values of a statement’s independent propositions for which the dependent proposition holds, we could then write out the mathematical reasoning which happened to lead from this particular combination of truth values — for example, *A, not B*, *not C*, in one case — to the ultimate truth of *P*. We could then repeat this process for each of the combinations of truth values for which the proposition *P* holds.

This process leads us to what computer scientists and logicians call the *Disjunctive Normal Form* of a boolean expression. *P*’s truth, here, is expressed as an “OR of ANDs”: *P* is true when, again for example, *A and B and C* is true *or* *A and not B and not C* is true. Each of these individual terms is an *and* statement, or a conjunction. They’re themselves being joined in an *or* statement, or a disjunction. P is expressed as a disjunction of conjunctions.

The Disjunctive Normal Form hints at how to we can begin to better organize mathematical statements. *P* can be imagined here as the root of a tree, whose nodes are the conjunctions which imply it. Each node is a conjunction — for example, *A and not B and not C* — and the link from a node to its parent, *P*, can be understood as the bit of mathematical reasoning which leads from the conjunction to *P*’s truth. *P* is true when any of its conjunctions, or children, are true. The exposition of *P*’s truth dependencies now involves simply visiting each of *P*’s children in the tree and demonstrating the mathematical implication that led from child to parent. In doing so, we explain *P*’s truth table entirely.

Disjunctive Normal Form solves the problem of excessive information. We’ve included only the combinations of independent truth values which we needed in order to demonstrate *P*’s truth, and we’ve discarded all of the truth table’s other rows.

But Disjunctive Normal Form contributes very little to capturing or expressing a statement’s hierarchical organization. Each of the implication statements, which led from some conjunction to *P*, had to be performed *anyway *when we first filled out the truth table, and this process has merely recast them in a rarefied form. We need a way not only to economize on information, but also to respect hierarchy.

**Building trees which express mathematical hierarchy**

Logically speaking, each of a statement’s independent propositions reside on a single level, and each of its dependent propositions reside on another. Mathematically, though, we don’t think of things this way. Some independent propositions are very basic, while others are critical suppositions. Similarly, some dependent propositions follow immediately from hypotheses and definitions, while others follow only after an incredible chain of mathematical explanations.

I’ll illustrate this point first with an abstract example. Logically, the three statements *A* → (*B* → *C*), (*A* & *B*) → *C*, and *B* → (*A* → *C*) are indistinguishable. They have identical truth tables! Intuitively speaking, however, they’re quite different. The first statement, read as “*A* implies that *B* implies *C*,” suggests that *A* is more of a background condition, under which the implication *B* → *C* holds. The third statement, read as “*B* implies that *A* implies *C*,” suggests that *B* is the background condition, under which the implication *A* → *C* holds. Whether *A* or *B* makes more sense as a background condition — or whether they should, as in the second statement, be placed on equal footing — is not a logical fact, but a mathematical one. It requires mathematical knowledge.

A basic example from group theory can demonstrate this point more concretely. A mathematical statement could assert that “If a group G is abelian, then every subgroup is normal.” The independent proposition that *a set G has a group structure* can be understood here as a background condition, under which the implication — linking a group’s commutativity to a fact about its subgroups — holds. Reversing the roles of *A* and *B* here would be simply incoherent: How can we talk about a group’s commutativity before we’ve supposed the existence of a group structure? Placing *A* and *B* on equal footing, moreover, would be distracting. We’d be forced to consider “A set with a group structure which also happens to be a commutative one,” foreshadowing a mounting list of conditions which we’d prefer not to simultaneously deal with. By hierarchically organizing our independent propositions, we can ensure mathematical sense and sensibility.

Similarly, we might prefer to express the truth of a dependent proposition *Q* rather as a sequence of implications — which still begins with an independent proposition, but also progress through a collection’s dependent propositions as well. *A → Q*, for example, can be reexpressed as *A → P* and *P → Q*, where the intermediate dependent truths must be ascertained through our mathematical understanding. Taking a example again from group theory — and related to the last — the rather complex assertion “Every abelian group has a cyclic tower” should perhaps be broken up, by first, at the very least, establishing the previously discussed implication relating a group’s commutativity to the normality of its subgroups, and, only then, progressing to the construction of desired subnormal chain.

Consulting not just logical structure but also mathematical structure, then, we can organize our statement’s “implication tree” in a more nuanced manner. We can imagine a tree in which each node is a conjunction of (possibly dependent) propositions, itself equivalent to the disjunction of its children. Each of these children are themselves conjunctions of (again, possibly dependent) propositions, again implied by a further disjunction of their own children, and so on. The leaves of the tree will be conjunctions of independent propositions. The individual propositions conjoined within each node might actually themselves be implication statements, or even entire trees of a similar form, though this shows up rarely in practice. The structure could be very complicated indeed.

As a side note, we should reasonably expect a complicated structure. We’ve posited a general classification of the structures of arbitrary mathematical statements. The mathematical world is very complex! To understand such a broad class of ideas, we should expect a complex general framework.

This organizational technique extends the advantages of the naive Disjunctive Normal Form discussed earlier. To prove some statement expressed in Disjunctive Normal Form, we simply had to prove that the dependent proposition was implied by each of its children. Now, to prove a statement, we must visit each node in the entire tree and prove every link from child to parent. To show that the tree’s root proposition happens to hold, we must show that one of its leaves holds.

This way of organizing mathematical ideas solves both of the problems we attributed to truth tables. It’s informationally compact and hierarchically sensible. This new organizational understanding will help us understand how mathematical theorems are built.

**How tree traversals induce mathematical proofs**

We’ve already learned that to prove a statement, we must visit each of the tree’s links from child to parent, mathematically demonstrating the implications as we go. Mathematical proofs must happen in some order, though. It would be strange (and require lots of paper) to arrange the expositions of a proof’s various implications in a tree-like formation. The next question becomes: In which order should we express the implications of this tree? We’ll see that this order is far from arbitrary.

One choice, indeed, could be to traverse a tree’s links in random order. This would lead to a proof difficult to follow indeed. We’d have to remember, for example, that *A → B* — as we jumped to the other corner of the tree to show that *P → Q* (a fact which we might use only much later) — only to come back again and, eventually, finally prove that *B → C*. *Why was B true again?*

Computer science has developed a library of systematic ways to visit each of a tree’s nodes exactly once; these procedures are called *tree-traversal algorithms*. (“Visiting” a non-root node, here, is understood as demonstrating the mathematical implication from this node to its parent.) We will see that most mathematical proofs look something like applying a tree traversal algorithm to a statement’s implication tree.

These algorithms can take many forms, and here I’ll only discuss two general classes of algorithms, *pre-order* and *post-order* traversals. Both of these are recursive algorithms. A pre-order traversal visits a node and then recursively visits each of the node’s subtrees. A post-order traversal recursively visits each of a node’s subtrees and then visits the node itself. If these procedures encounter a leaf, they visit the node and then terminate. These simple rules, as it turns out, are sufficient to completely specify traversals of complex trees!

The actual behavior of these traversal algorithms is not quite obvious, and understanding them requires some familiarity with recursion. Let’s consider a simplified “tree” in which each node except the last has only one child — in other words, a linked list. The pre-order traversal, beginning at the root, will instantly visit this node. It’ll then pass to the root’s single subtree, and begin by, of course, visiting the root of this subtree. It’ll then pass to *this* subtree’s root, making its way in sequence down finally to the leaf. The post-order traversal algorithm is opposite. It’ll start at the root, by ignoring this node and immediately beginning to traverse the root’s subtree. Traversing this subtree will involve ignoring *this* node, and immediately progressing to *its *subtree. The algorithm will keep delegating control to subtrees, until it’s finally reached the lowest leaf. Having already accumulated a stack of recursive calls, it will finally perform its first visitation, on the leaf node, and only then return back one level to visit this leaf’s parent. Working its way back up the tree as the stack unwinds, the post-order traversal will visit the tree’s nodes in the opposite order as before.

These traversal methods can be applied to a mathematical statement’s implication tree. The post-order traversal algorithm induces the proof strategy we’re most familiar with. Skipping to the bottom of the tree, it will show that the first, most basic, proposition implies the next, which then implies the next, and so on, until the final root proposition is directly proven. A pre-order traversal induces the opposite approach. It will show that our final proposition follows from a slightly simpler sub-proposition, which *itself* is implied by some still-lower sub-proposition, and so on, until, ultimately, it’s shown that some intermediate proposition follows from the first proposition we started with. Similar reasoning leads to proof techniques in trees featuring more than one child per node.

These two methods generally produce coherent proofs, and the best method will depend on the details of the particular mathematical problem at hand. They’re unwieldy, though, because statements’ trees can be very large. For ease of exposition, sometimes it makes sense to compartmentalize our trees into sub-parts.

**Using theorems to consolidate trees**

These trees’ branches might stretch all the way back to a subject’s most fundamental and trivial regions, beginning from propositions which are either too basic, too obscure, or too numerous to bother with. Certain regions of the tree — which are, say, especially extensive, challenging, or useful — we’d like to traverse only once, and then “save” our progress under an abbreviation or encapsulation.

There’s in fact a systematic way to think about collapsing regions of a particular tree. Any contiguous collection of nodes — each pair of nodes must be connected through some path — can be replaced by a single node, if we make sure to connect each of this region’s former children directly to the new node. Traversing this so-called *subtree* just once, and then replacing the entire subtree by a symbolic “node” representing the contents of the successful traversal, we can reduce the size and complexity of our tree.

The entire tree, in fact, can be partitioned into a collection of disjoint contiguous regions. We can traverse each region independently, and then, collapsing each region, we can form a new, smaller, tree, whose nodes actually themselves represent entire collapsed sub-regions.

This is what mathematical theorems do. Because mathematical statements typically feature very large implication trees, theorems take smaller, manageable sub-sections of these trees, traverse them just once, and then put a “package around and a name on top”. The entire contents of this package can then be simply referenced by name, and the traversal need not be conducted again. Theorems permit us to compartmentalize and modularize mathematical progress.

Theorems bring the benefits of modularization to mathematics. Certain results are large and extensive. It saves time and energy to prove these results only once, and to later refer to these results simply by their names. Mathematical facts are also, relatedly, often conceptually challenging, and to work with them can demand significant mental resources. By proving a theorem and then handling its result independently of its proof, we can release the valuable mental resources necessary for further progress and understanding. Finally, these theorems make it easier to manage and detect mistakes. Compartmentalizing the ingredients of a proof, we can inspect each region for errors in an independent manner.

The analogy with computer science should be quite explicit here. The benefits, such as reusability, stability, and clarity, brought to computer science by its various encapsulation techniques (such as subroutines and objects) each feature quite precise analogues in the realm of mathematics.

Theorems, then, are encapsulation devices. They make mathematical results more digestible, comprehensible, and manageable.

But they can also obscure. Hiding away the details of our mathematical constructions, we can alienate ourselves from their origins. This could lead to problems ranging from misapplication of theorems to undue surprise at their consequences. “They’re marveling at the puppets dancing,” a friend at Hopkins once told me, “because they’ve forgotten that the strings are there.”

These are tradeoffs that the human mind must face as the mathematical world becomes increasingly complex. The phenomenon — whether progress or problem — will undoubtedly persist as math continues to advance.

There’s a dual notion at play here, related to the contraposition.

Negating both sides of

P‘s or-of-ands expansion and then applying DeMorgan’s laws to the left side, we can expressP‘s negation, denoted¬P, inConjunctive Normal Form— a conjunction of disjunctions, dual to the DNF.¬Pis now expressed an and-of-ors, and to prove that¬Pimplies this entire conjunction, we must prove that¬Pindependently implies each of its terms.Extending this reasoning, we can create a generalized CNF tree. This time,

¬Pis on top, arrows point down, and each node implies the conjunction of its children. To prove the theorem, we must traverse the tree, demonstrating each downward mathematical implication from parent to child. Now, post-order traversals lead to “forward proofs” and pre-order traversals lead to “backward proofs”. To show that the theorem’s statement in fact holds, we must show that¬Pholds, and thus each dualized, negated truth value combination of its independent propositions holds.Previously, we handled statements asserting both

P‘s truth under certain conditions and its falsehood under others by breaking up the statement and proving an additional proposition,¬P, using Disjunctive Normal Form. Dually, we can also prove¬Pby expressing¬Pin Disjunctive Normal Form, negating both sides, dualizing using DeMorgan’s laws, and proceeding using a CNF tree. This timePwill be on top.The duality of this construction resembles that seen in category theory and its mathematical realizations.

Definitions also feature a hierarchical tree structure. In mathematics, definitions are (see this comment) little more than an extensive and elaborate system of shorthand. In any particular definition, the

definiensgenerally itself includes definitions, which themselves encapsulate other definitions, and so on. Unwinding everything all the way, perhaps all assertions would be about set theory!