Insights from statistical analysis of great (and not-great) literature

Loknath Das

Ben Blatt’s Nabokov’s Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing takes advantage of the fact that so much literature has been digitized, allowing him to run statistical analyses on writers, old and new, and make both fun and meaningful inferences about the empirical nature of writing.

Blatt’s book covers everything from James Joyce’s use of exclamation points (1,105 !’s per 100,000 words) to the most distinctive words appearing in erotica whose authors hail from New York City (“subway, popsicle, senator, butthole, museum, landlord, thrusted, Jacuzzi, sin, and shrugs”).

Dan Piepenbring’s review of Mauve highlights the implicit social insights that can be gleaned from this sort of analysis (“Male authors are far likelier to write ‘she interrupted’ than ‘he interrupted'”), and also the way that the book made him feel about his writing.

“The written word and the world of numbers should not be kept apart,” Blatt writes, and I think he’s right; what’s frustrating is that no one has yet figured out how they might productively collaborate. Like last year’s “The Bestseller Code,” which described an algorithm that predicted the plots of popular novels, “Mauve” wagers that the “digital humanities,” as they’ve uneasily come to be known, can instruct audiences outside of the academy. The book’s finest moments prove that they can—but to what end? Blatt argues that his work is “not an attempt to ‘engineer’ art as much as a way to understand it”: “If you were a band in the 1960s you would want to know how the Beatles were recording their songs.” Maybe so, but is that really what this book professes to teach? Knowing the rate at which Ringo hits his snare drum does not a Beatle make.

Reading “Mauve,” I began to imagine two duelling schools of authorship, both motivated by statistics. In one, writers would cultivate their tics, inhabiting themselves so thoroughly that to encounter them on the page would be like finding their footprints in wet cement. In the other, writers would aspire to defy the data, styling their prose with such intricate, chameleonic grace that no statistician could betray their identity. A few decades ago, the advent of the word processor made it easier than ever to revise on the fly; it also made it easy to dwell on one sentence ad infinitum, gilding the lily where once one would’ve advanced to the next thought. The glut of data is another mixed blessing—past a certain point, writers would do better in a state of blissful ignorance. Otherwise, they might end up with work like my ninth-grade term papers, mannered and overwrought.

Nabokov’s Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing [Ben Blatt/Simon & Schuster]

The Heretical Things Statistics Tell Us About Fiction [Dan Piepenbring/New Yorker]

[“Source-boingboing”]