“Isn’t it true that example-sentences that people that you know produce are more likely to be accepted?” – De Roeck et al., 1982 
“The man the dog the cat scratched bit died.” – Dan Scherlis, a former linguistics classmate of my mother
Chomsky first articulated the distinction between grammaticality and what he called performance. Making a grammatical sentence is one thing. Transmitting it successfully is another, and many potential obstacles – from distracting noise to the capacity of the human mind – can get in our way.
In particular, certain sentences are grammatical, but effectively incomprehensible. These sentences are typically complex, and they might contain intricately nested clauses and phrases. The capacity of our minds is limited. Language’s capacity for recursion is not. Who could be surprised that space eventually runs out? (The two sentences above contain double center embeddings, which are notoriously difficult to parse.)
Some sentences, though, feature an inscrutability difficult to explain on account of their complexity alone. Within a collection of sentences similar in length, complexity and meaning – but different in organization – certain sentences can emerge as particularly difficult to understand. Further, these arcane sentences share distinctive commonalities. (The second sentence, though perhaps much simpler than the first, is typically found to be less comprehensible.)
Linguists seek to describe these commonalities. Which grammatical characteristics make a sentence more difficult to parse than we should expect it to be on account of its semantic complexity alone? The enumeration of these characteristics is a central task of psycholinguistics. Linguists have developed precise technical criteria which purport to predict when a sentence – despite its grammaticality – is liable to baffle the mind.
These criteria are as fascinating as they are technical. Edward Gibson’s seminal work introduced Processing Load Units (PLUs) – which represent units of mental parsing difficulty – and described grammatical constructions which induce the accumulation (or reduction) of PLUs. When, and only when, the present tally of PLUs exceeds four, Gibson found empirically, our mental parsers simply fold. Gibson’s system proved incredibly predictive. James David Thomas explains the grammatical constructions which generate Gibson’s PLUs:
Associate a PLU to each lexical requirement position that is obligatory in the current structure, but is unsatisfied.
Associate a PLU to each semantically null C-node category in a position that can receive a thematic role, but whose lexical requirement is currently unsatisfied. 
These constructions produce convoluted – and often amusing – example sentences. Again, quoting from David’s work:
Claim 1: That embedding a relative clause inside a sentential complement is easier than the opposite embedding.
- The hunch that the serial killer who the waitress had trusted might hide the body frightened the FBI agent into action.
- The FBI agent who the hunch that the serial killer might hide the body had frightened into action had trusted the waitress.
Claim 2: That embedding a relative clause inside a sentential subject is easier than the opposite embedding.
- Whether the serial killer who the waitress had trusted might hide the body frightened the FBI agent into action.
- The FBI agent who whether the serial killer might hide the body had frightened into action had trusted the waitress. 
Welcome to the far-flung edge of the grammatical universe: sentences which are grammatical, but indecipherable; which feature regular grammatical structure, but organize it in such a ridiculous way that we lose all hope of comprehension.
Fun and Games
We use this setting to manufacture sentences with systematically absurd structure. How (little) are we constrained by the requirements of grammaticality? Though the sentences we produce won’t be comprehensible, they’ll be grammatical. I’ll pose a few games to the reader.
Game 1: Construct a family of sentences 1, …, n, … such that for some word, in the nth sentence this word appears n times consecutively.
Solution: Consider the family of sentences (I’ve bracketed embeddings for clarity):
- Proposition P is true.
- That [proposition P is true] is obvious.
- That [that [proposition P is true] is obvious] is obvious.
- That [that [that [proposition P is true] is obvious] is obvious] is obvious.
- That [… [that [proposition P is true] is obvious] …] is obvious.
Proceeding in this manner, we can construct grammatical sentences in which the word that appears repeatedly in consecutive sequences of arbitrary length. (Similar constructions can be achieved with other subordinating conjunctions, such as whether.) Each of these sentences is grammatical; given enough time, each could be understood.
This example works because the subordinator that repeatedly serves to embed an entire sentence into the subject of the next, larger sentence. The subject of any given sentence contains a descending chain of smaller, nested “copies” of itself.
These sentences would surely be reported incomprehensible after the second or third.
The sentences in this family feature syntax trees which are skewed heavily towards the left (subject). Each sentence’s tree features a large subject consisting in a long chain of subordinations; to each of these links (as well as to the root node representing the sentence itself), we also attach a copy of the small verb phrase is obvious. The size of the tree then – to use the language of computer science – grows linearly, or on the order of n.
Game 2: Construct a family of sentences 1, …, n, … such that for some word, in the nth sentence this word appears n times consecutively on two separate occasions, and such that the size of the sentences’ syntax trees grows exponentially, on the order of 2^n.
Solution: We use the coordinating conjunction and to join two copies of the earlier phrase within each successive embedding. Consider the family of sentences (embeddings are bracketed):
- Proposition P1 is true.
- That [proposition P1 is true] is obvious and that [proposition P2 is true] is obvious.
- That [that [proposition P1 is true] is obvious and that [proposition P2 is true] is obvious] is obvious, and that [that [proposition P3 is true] is obvious and that [proposition P4 is true] is obvious] is obvious.
- That [that [that [proposition P1 is true] is obvious and that [proposition P2 is true] is obvious] is obvious, and that [that [proposition P3 is true] is obvious and that [proposition P4 is true] is obvious] is obvious] is obvious, and that [that [that [proposition P5 is true] is obvious and that [proposition P6 is true] is obvious] is obvious, and that [that [proposition P7 is true] is obvious and that [proposition P8 is true] is obvious] is obvious] is obvious.
- I’ll leave this one out for your and my sake. It will contain 2^n propositions.
The syntax tree of the nth sentence, for any n, resembles a balanced binary tree of height n. Each new sentence embeds the previous sentence twice, in two separate clauses joined by the conjunction and. The words “that” and “is obvious” flank each embedding. The two n-length consecutive sequences of the word that occupy, respectively, the leftmost path of the entire tree and the leftmost path of the root’s right subchild. (Bonus question: what rule describes the comma placement?)
This exercise surely seems ridiculous. But behind the investigation and the games, an important point stands: grammaticality is quite a different condition from comprehensibility. Past that small region enclosed by the demands of comprehensibility, a much larger realm lies, where – as the complexity mounts – grammar continues to operate.
This world of the grammatical marches off far past our minds’ horizon.
- DeRoeck, et. al. provide counter-examples to the “myth” that native speakers reject double center embeddings.
- James David Thomas’s excellent masters thesis, “Center-embedding and Self-embedding in Human Language Processing“, was an invaluable resource in the writing of this post.