Featured Post


I am posting this as a benchmark, not because I think I'm playing very well yet.  The idea would be post a video every month for a ye...

Sunday, October 8, 2017


Imagine a language with 10 words, with sentence length limited to 10 words as well.  This language has no syntactical rules so the words can come in any order, the number of sentences in this language is 10 billion:


Now imagine our language, with thousands of words, not 10.  Our sentence length has no fixed limit, but let's say a lot of sentences are in the 30-word range. If we didn't have rules of combination (syntax) and the rule that sentences had to make sense on the semantic level, then we could have utterances like

pig pencil the the the and snare strives Houdini motion over arbitrary the contraption whiggish...

If my base ten example with 10 digits yielded 10 billion, then there is no way I can calculate the number of 30 word strings possible. I would be typing zeros all day long.

But we have syntactical rules, so we might think of utterances as syntactical patterns that we can plug lexical items into.  The the game of saying adjective noun transitive verb adjective noun.

Fat pigs devour small toads.  There are five slots, and in each slot can go as many different adjectives, nouns, and verbs exist in English.  We are still talking about large numbers. If we filter out the ones that don't make any sense, then there are still huge numbers. With numerous syntactical pattern available, and relatively longish utterances, we are getting into multiple zeros of sentences that make sense, so that it is fairly easy to produce unique utterances like

The small vase contained a mixture of coffee beans from numerous regions and a specially trained cat was charged with the absurd task of separating them into fastidious piles.

This is what Chomsky called the creativity of language.  There are other limits, I suppose. For example, language consists not only of words but of phrases in statistically probable combinations, so there will be hundreds of instances of "charged with the absurd task" on google. The more improbable the combination, the more unique, the less clich√©-like, it is. It is easy to see why statistically common phrases exist: language is not random, and has no obligation to be. There are conventions of discourse and easy short-cuts like "this book makes a significant contribution to the field."

Some factual information is difficult to state in a unique way. So-and-so was born in this year in this place. The attempt to state it differently would sound affected. But the chances of several sentences in a row being identical without direct copying (or a common source) are infinitesimal. Imagine Thomas awaking today and coming up with the sentence: The small vase contained a mixture of coffee beans from numerous regions and a specially trained cat was charged with the absurd task of separating them into fastidious piles as his example of a unique sentence. We would call it a psychic phenomenon. We wouldn't believe it a coincidence.

It gets worse for the plagiarism defenders. When you are writing carefully and thoughtfully about something your ideas and your language will grow more unique and original, and probably more statistically improbable for that reason. Your writing will have an autonomous voice. This is what we expect from Mr. or Dr. or Ms. Prominent Writer.  Will a good writer avoid all statistically common patterns? No. Good, fluid writing has to rely on some expected chunks of language that don't call attention to themselves. But if you never, ever find a luminously surprising adjective-noun combination or a semi-original turn of phrase you simply aren't a writer.

No comments: