Wednesday, June 20, 2012

The Math Behind The Culture

From John Milton's Paradise Lost to YouTube, there is beautiful math behind cultural phenomena.

Consider the long s, that archaic letter which looked like an f or integral sign. It once appeared in words where an s fell at the beginning or middle of a word.

Using Google's N-gram tool, we can pinpoint the moment that the long s fell out of style: 1800. It was then when "ſaid" became "said," when "Paradise Loſt" became Paradise Lost -- and when, more broadly, modern English as we mostly know it today came into form.

What is so striking, even beautiful, about the demise of the long s is that, as seen the time series documenting the disappearance of "loſt" and its replacement by "lost" look exactly like a pair of logistic functions.

Here is a graph of the time series from 1750 to 1850, with a "smoothing" of 3:

The results repeat themselves for most other examples.

Compare that to an example of modern culture -- the YouTube video of President Obama speaking at this year's White House Correspondents' Dinner:

Again, if you look at the statistics for many other YouTube videos, the logistic function appears again and again, after any major link-to a video received.


  1. I would suggest that the beauty is not in the logistic functions themselves, but rather the differential (or difference) equation defining them.

  2. Interesting.

    Maybe one for you to look at - the derivative of smoothed interest rates (fed-funds rate) in the US since the 50s looks like a logistic function too. I've been trying to work out why this might be, maybe a young guy like yourself might have better success.

  3. Correction - the *integral* of smoothed interest rates.

  4. Well one interpretation of why these curves show up is nice as well. Think about an author's choice of using "long s" (vs. "s") as having some probability, say, 99% in 1700. Converting that into odds (e.g. 99:1 odds in favor of the author using "long s"), if those odds change at a constant rate of change (e.g., decrease by 5% per year), the probabilities tracked over time will be logistic.

    Sure this doesn't answer why odds should change at a constant rate, but when you see actual data doing anything with a constant it's a pretty cool start.

  5. What is curious is not the curve, though that is neat in itself, but that other changes in spelling were occurring simultaneously. For instance, "rein deer" became "reindeer" and "ring dove", became "ringdove". As you say, that appears to be when English became modern. By why then?

  6. Why then? Starting in ...
    Louie 16th was the King of France
    In 1789.
    He was worse than Louie the 15th.
    He was worse than Louie the 14th.
    He was worse than Louie the 13th.
    He was the worst, since Louie the first.

    King Louie was living like a king,
    but the people were living rotten.

    First the US Revolution, too far from London, but then the French Revolution -- the end of the elitism. The Long S is elitist.

  7. Housing is a good bit more of CPI and the average person's cost of living than it is of NGDP. I'm a proponent of NGDPLT. But this appears to be a weakness in that policy prescription.

  8. Interesting I wonder what the transition of Þ (thorn) to th would look like