The experimenter across from you, wearing a lab coat and an identification tag, places his clipboard on the table. He explains the day’s experiment. “I’m thinking of a rule which generates triples of integers,” he begins. “I’ll first give you one example of a triple conforming to the rule. Then, you must propose additional triples, and I’ll give immediate feedback: yes or no. You must try to guess the rule after submitting as few questions as possible. Ready?”

The experiment begins. “2, 4, 6.”

- You: “4, 6, 8?” Experimenter: “Yes”.
- You: “6, 8, 10?” Experimenter: “Yes”.
- You: “10, 12, 14?” Experimenter: “Yes”.

“Ascending triples of consecutive even integers?” you blurt out. “No. Ascending sequences in general,” the experimenter responds, as he leans back and disdainfully scribbles something on his board. You’ve been defeated.

The famous *Wason’s 2-4-6 Task* was designed by P. C. Wason in 1960 [1] to investigate hypothesis-testing behavior in subjects. Wason’s experiment systematically exhibited the prevalence of what’s today called *confirmation bias* *–* most subjects sought only to confirm their own hypotheses, and rarely to refute them. We never tried a sequence which did *not* consist of consecutive even numbers! The sequence (20, 26, 48) would have still produced a yes, as would have (1, 2, 3); only if we had tried, say, (8, 3, 6) or even (6, 4, 2) would we have finally received a no. In Wason’s original experiment, only 6 of 29 subjects guessed correctly on their first attempt. Confirmation bias is one of humans’ most prominent and misleading cognitive biases.

**Generality**

The hypothesis-testing strategy we demonstrated above was surely a poor one. We must test against our own pet strategies! But it’s not especially clear what a better strategy would have looked like.

How could we make a better strategy? We could imagine progressively expanding to higher degrees of generality until we reach a no.

- Subject: “4, 6, 8?” (Ascending consecutive even integers.) Experimenter: “Yes”.
- Subject: “4, 10, 100?” (Ascending even integers, but not necessarily consecutive.) Experimenter: “Yes”.
- Subject: “5, 30, 37?” (Ascending integers, but not necessarily even.) Experimenter: “Yes”.
- Subject; “70, 30, 53?” (Integers, not necessarily ascending.) Experimenter: “No”.

We’re finally in a position to make an educated guess. “Ascending sequences of integers?” “Correct”.

A more promising strategy emerges. We may imagine a sequence of candidate solutions, arranged in order of increasing generality. We march along our chain until the experimenter’s yes becomes a no. At this farthest juncture, the rule lies. We could also imagine moving in the other direction: steadily narrowing our generality until we finally break into acceptable ground. Better yet – and with an eye to the algorithms of computer science (this would work especially well if our ordered chain were quite long) – we could perform a binary search on our ordered sequence, repeatedly bisecting the candidate pool using the experimenter’s responses until we reached our critical point.

**Branching**

This strategy too seems strange. How do we choose the order of our successive generalizations? When we arrived at ascending even integers, we gathered that discarding evenness first was the next natural step, with discarding ascendingness to ultimately follow. But why didn’t we first discard ascendingness, and then discard evenness, ending as we did before on general arbitrary integers? Instead of a linear chain, we see two paths which split from each other and then immediately rejoin.

Worse still, choosing the other path would have gotten us into trouble. Upon first discarding ascendingness (guessing even integers, not necessarily ascending), we would have immediately received a no. We would then know that descending sequences produce trouble. The test concerning evenness, though, would remain yet undone, and the impending extra no could offer us no additional clues about the role played by evenness. The role of evenness would remain mysterious. Our algorithm would have had us incorrectly submit *ascending even sequences as* our final answer, an excessively “safe” guess. The tests are out of order.

We could consider a more advanced strategy. When we receive a no, we move back into safe ground, and then try generalizing in a *different* way. When all avenues forward seem to fail, we mark the farthest point as a candidate.

We can arrange our various strategies in a complex web, with arrows linking less general strategies to more general ones. We’ll follow this web as far as we can without passing into unsafe territory. If paths branch, we may perform the above procedure for each branch, accumulating a collection of “farthest outposts”. Because each of these rules individually is safe, we can submit, as our answer, the larger rule consisting of these rules’ *union *– that rule which generates all of these patterns.

The particular rule featured in the Wason task was very tractable. The problem of inducing an arbitrary rule, though, now seems virtually unfeasible. This impression will only grow as we continue to explore.

**Naturalness**

What do we mean when we say that one rule is “more general” than another? Strictly speaking, each rule describes some subset of the mathematical set consisting of all triples (x, y, z) of integers x, y, and z; by *more general *we mean that the set consisting of sequences generated by the former rule is a strict superset of the set consisting of those generated by the latter. Given the candidate rule “ascending consecutive even integers”, for example, we felt that “ascending even integers” was a natural move towards generality.

There are many, *many* rules, though, which are more general than the rule *ascending consecutive even integers* in the sense that they encompass that rule’s subset and then some. *Ascending even integers* is one example. *Ascending even integers plus the single sequence (1,000, 10,000, 10,001) *would be another. Mathematically speaking, it’s difficult to distinguish between these two examples of movements towards generality. Both rules’ corresponding subsets strictly contain that subset described by our first rule. How on earth did we decide to propose the sequences that we did? We proposed those sequences because we felt that they were *natural* in some sense.

There are a number of ways in which we could understand the notion of *naturalness* in rules. One obvious way – and that which the psychologists probably expected – is that which I might call *psychological complexity*. Psychological complexity favors those sequences simple to understand and describe. Guessing rules, we naturally gravitated towards those describable using grade-school arithmetic. These rules feature basic operations like counting up or down by fixed or non-fixed intervals. The membership of given rules to this set is not defined precisely.

Kolmogorov complexity, a large body of mathematical theory developed by the great Russian mathematician Andrey Kolmogorov, provides rigorous tools through which one can talk about the “complexity” of mathematical objects. Under Kolmogorov complexity, certain subsets of the space of integer triples can be said to be *simpler* than others, in the sense that they require simpler mathematical specification to describe. I’m not sure exactly which subsets would be declared under Kolmogorov’s work to be particularly simple. Whatever their identities, these subsets might emerge as reasonable candidates.

Finally, philosophers have worked extensively to understand the ways in which some subsets of the world – and the words or specifications which demarcate them – can be understood to be more *natural* than others – to “carve nature at its joints”, to use Plato’s words. It’s possible, analogously, that certain subsets of the set of integer triples can be said to carve this set at its joints, in the same manner as that described by the philosophers. These natural subsets would emerge as our candidates.

Each of these avenues’ investigation would demand significant resources, and I welcome comments related to them.

**Mathematics**

Without these supplementary notions of naturalness, though – and recall that the experiment’s environment didn’t specify any – we find ourselves surprisingly lost. There are no ways to generate feasible rules; no rule can be distinguished above any other; *no strategy is better than any other*. A rule is simply a subset of the mathematical set of integer triples. The triples answered *yes* are members of the subset. The triples answered *no* are not. Beyond this, we can know very little, and after finitely many guesses, there are (uncountably!) many subsets which contain the former and exclude the latter. None of these are distinguished above any other. Good luck to the subject! (The only truly bad strategy would be to guess the same triple multiple times.)

It’s not clear that the original experimenters thought through these things. The strategies pursued by the subjects were deemed by the authors to be foolish. These strategies are only foolish under the assumption of the presence of certain naturalness constraints, though, and absent these constraints, these strategies are only as foolish any of uncountably many others! Testing against one’s hypothesis is a good strategy – again, though, only within the environment engendered by the constraints associated with one of a number of possible notions of naturalness.

The moral is that an intelligent subject, sitting across the table from the haughty experimenter should, perhaps, have responded to the instructions with simple disbelief: “You expect me to do *what?*”

- Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129-140.

You spent only a few sentences talking about psychological complexity. But any mathematician–or, at least, any

statistician–would have known that this was, with almost 100% certainty, the system of complexity upon which the experimenters relied.Not only are the testers themselves probably not familiar with any other notions of complexity, but, even if they were, they

themselveswould have known that theirsubjectswere likely unfamiliar with any other notions of complexity. So, to even weigh and consider other systems begins to look ridiculous.Most importantly, then, might be the fact that your

mathematicalanalysis of thispsychologicalproblem wasted mental, and–as others who were present that day can attest–temporal resources.So, it seems that your mathematical training actually put you at a disadvantage towards solving this task. And the irony seems to have been lost on you.

Perhaps Wason’s table has been turned once again.

On another note, this and my MBTI post should be put together in a series called “Tests that Fail.”

I do not think this was a waste of my temporal resources. It is amusing to recognise that a test like this one measures success and failure by standards that are difficult to pin down with much mathematical precision. But it would be disingenuous to insist that we are unable to recognise the intuitive sense in which some test answers either meet or fail to meet those standards.

I also did not consider it a waste of mental resources. But then, I’m a philosopher, not a scientist. I have the luxury of thinking about things without having the sort of concerns that would lead others to disregard all of this as mere trifling.