Sunday, January 17, 2016

Does iTunes Compete with Spotify?

iTunes and Spotify are both leaders in digital music. For that reason they are in competition. Right?

Well, to an economist, there's a bit of a problem. Both iTunes and Spotify, at least in their basic versions, are free -- and when goods are free, the notion of competition gets cloudy.

Economists, for example, assess whether two products are substitutes or complements for each other with a concept called the cross-elasticity of demand. When the price of one product rises, that is, how does demand for the other product respond? When it rises, the products are substitutes. When it falls, they are complements. Simple.

The idea can't get off the ground without a price. Most of the time, that's not a problem. Ford competes with Toyota, apples with oranges. No price, no scarcity -- and, it might seem, no economics. But the growing amount of economic activity on the Internet, where competition is so obvious and yet so many products are free, requires a more flexible notion of competition.

Let's stick with iTunes and Spotify as a case study of two high-profile Internet competitors. In the data analysis that follows, I find that iTunes loses a user for every five new users of Spotify and that the introduction of Spotify has so far reduced iTunes use by 15 percent globally. There is, in other words, competition on the Internet -- and competition even when goods are free.

[See update below: More data, bigger results.]

Using Google Trends, I downloaded weekly Google search data for iTunes and Spotify for 57 countries between 2004 and 2015. These, I think, are good proxies for signups for iTunes and Spotify, since people will search on Google to download either. (In my conversation with Josh Gans, I found that searches with "Spotify" or "iTunes" are usually about downloading them.) And the search data seem to match what we see in the limited public data on iTunes and Spotify use. I picked the countries based on where iTunes and Spotify had launched, looking for their target markets.

iTunes was initially released in 2001, and Spotify was in October 2008, both in select countries only. What we see in the Google data is that iTunes has been trending downwards since 2012. Perhaps not coincidentally, that was also when Spotify began expanding beyond its initial markets in northern Europe. That, of course, hardly proves that Spotify was responsible.

For iTunes or Spotify to enter a country, they must strike deals with big music labels in the context of local intellectual property law. Spotify negotiated for two years, for instance, before entering the US. As a result of this legal thicket, Spotify rolled out gradually across my sample of countries -- and I can exploit that gradual expansion to measure the effect of Spotify on iTunes. Using official Spotify press releases, like this one, I added Spotify's entry dates (or lack of entry, for some of the 57 countries) to my dataset. You can download my data here.

To measure the effect of Spotify on iTunes, I'll use an instrumental variables regression, one simple enough that even people who aren't stats nerds or economists can follow.

This method requires an important, but believable, assumption: The only way the timing of Spotify's market entry affects iTunes use is through Spotify use. Said differently, nobody decided to stop listening to music, or switch to Pandora, because Spotify became available. Which wouldn't make much sense. Formally, for the economists, my claim to instrument exogeneity follows from the independence of irrelevant alternatives.

The first step is to build a simple regression model that predicts Spotify use in a given country in a given week from whether Spotify was available in that country, whether it had launched in that country in that week, and how many weeks it had been available in that country.

I found, not surprisingly, that Spotify use was higher in countries where Spotify had entered, higher in the launch week, and growing in the number of weeks since launch. (I also included a term to capture the fact that this growth slowed eventually.) What is perhaps surprising is that my simple model explains 75 percent of the variance in Spotify use among the 57 countries from 2004 to 2015.

Here, for instance, is what the model's predictions look like versus actual data for the US:

Next, I take my model's predicted values for Spotify use for the 57 countries and use those to predict iTunes use. That regression, under our assumptions, actually measures the causal effect of Spotify on iTunes.

I find that, for every one new user of Spotify, iTunes loses -0.23 users -- although the 95-percent confidence interval is quite wide, at -0.40 users to -0.05 users. Roughly speaking, for every five that Spotify gains, iTunes loses one.

I can also use that conversion rate to figure out the total effect of Spotify on iTunes use over time. iTunes use is 15 percentage points lower because of Spotify use than it would have been in a world without Spotify.

Spotify thus explains around a third of the decline in iTunes use since its peak in 2012. There seems to be more to iTunes' decline than competition alone. Yet competition does exist, quite clearly, online and among goods without prices. It would be worth doing a similar analysis for MySpace and Facebook, or Netflix and Hulu and HBO GO.

Note about graphs: It doesn't affect any of the results, but the year labels are slightly off.

Update (1/18/16): I have expanded my dataset to 72 countries, and the marginal effect of Spotify on iTunes is -0.33, with a 95% confidence interval of (-0.51,-0.16). This implies that Spotify has caused a 20-percent drop in iTunes use, and I am thus able to explain about half of the decline of iTunes since 2012. The rest, presumably, must be related to the fact that iTunes is just an awful piece of software.

Friday, December 18, 2015

The First Count

An unofficial count of the number of anti-Muslim hate crimes since the Paris and San Bernardino attacks is now available, and it looks like my and Seth's model got it right.

Updating the search data to the most recent week, the model estimates there were about 37 hate crimes against Muslims since the Paris attacks. There were 38. And the model estimates there were about 16 hate crimes against Muslims since San Bernardino. There were 18.

Had anti-Muslim sentiment been at normal levels, there would have been about 8 and 4 respectively, so the model did a remarkable job getting right the magnitude of the deviation from normal. These were all "out-of-sample" forecasts -- that is, none of our estimates were made including the last month of data.

A few caveats. First, the model's performance is unusually accurate this time. It's better to expect us to miss by about five to ten hate crimes or so in future surges like this one, rather than being off by one or two. The model leaves more unexplained than this single performance would suggest. Second, this is an unofficial count; as we said in the article, we will be waiting until year-end 2016 for the official numbers, which might be different.

Wednesday, December 16, 2015

Rise of Hate Search: Follow-Up

by Evan Soltas and Seth Stephens-Davidowitz

We are adding more detail than we could fit in our op-ed in The New York Times. For all data files, refer to Seth's website.

Simplest Prejudice Measure

The simplest prejudice measure we used was the search “Muslims are ___” that was completed with a negative adjective.

We estimate the top 5 negative searches of this form for Muslims are “Muslims are evil,” “Muslims are terrorists,” “Muslims are bad,” “Muslims are violent,” and “Muslims are dangerous.”

The reason we started with this measure is it is possible to get a similar measure for other groups, which we also did and will more fully explicate in a piece in January.

Racial Threat

To test racial threat versus the contact hypothesis, we looked at anti-Muslim searches in the 10 counties with the highest proportion of Muslims.

These were found here.

We used the negative-adjective searches discussed in the previous section. These included “are Muslims evil?” and “Muslims are evil.” The volumes were found on Google AdWords. Unfortunately, Google AdWords, the only source that gives county-level data, does not include search rates. So, instead, we estimated total searches based on searches for the 10 most common Google searches in the United States, as found on Google Trends.

The search volumes and calculations are in the file MuslimsUSRates.csv.

Note that, even if this suggests that proximity does not lower discrimination, there is strong evidence that organized and facilitated intergroup contact may reduce biases, as a meta-analysis by Pettigrew and Tropp finds.

Islamophobia and Anti-Muslim Hate Crimes

We compared anti-Islam hate crimes to a bunch of searches that both may suggest Muslims are in the news and that Islamophobic is high.

The simplest thing to do is just use the measure of prejudice we developed previously. This is search volume for “ “Muslims are evil,” “Muslims are terrorists,” “Muslims are bad,” “Muslims are violent,” and “Muslims are dangerous.” It was the measure we used to compare prejudice against other groups and in different locations.

At the weekly level, they were highly correlated (r=.16; t=3.7). They were even higher-correlated restricting the data since 2008, when the Google data has become less noisy (r=.26; t=4.78.

They were also highly correlated at the monthly level (r = .37; t=4.57).

Anti-Muslim hate crimes were not similarly correlated to any other prejudice, using this simple, blunt measure.

In addition, the relationship was not explained by trends or monthly factors. (All this data is available at WeeklyPrejudicePlusHateCrimes.csv and MonthlyPrejudicePlusHateCrimes.csv.)

However, to best predict what searches matter and what don’t, we downloaded a large set of weekly searches and compared it to weekly hate crimes. Since the goal was prediction, we just want to let the data speak to what searches best predict hate crimes. We used about 35 common search phrases related to Muslims or Islam, which we found by using Google Correlate, the “top searches” feature within Google Trends, and Google auto-complete. In pre-processing, we normalized all series to means of zero and standard deviations of one. The LASSO selected 12 terms, yielding an L1-norm of 0.82.

Some were obviously Islamophobic, such as “I hate Muslims.” Others had some clearly non-Islamophobic uses. One of the striking things in the data, however, is that even seemingly innocent searches, such as Koran, include many potentially Islamophobic searches, such as those related to burning the Koran.

We put all the data in an OLS Lasso model. The Lasso model generally chose shorter searches -- one or two word searches, rather than many-word searches. A probable reason is that these data were much less noisy at the weekly level. We chose the constraint on the L1- norm by 10-fold cross-validation, minimizing mean squared error. We also tested a Poisson LASSO regression model, which is more appropriate for count data; this yielded virtually identical results and predictive power.

We could explain about 10 percent of the weekly variation with Google searches and about 25 percent of the monthly variation.

Our initial data was through 2013, which was the only data available online. However, we recently obtained new monthly data from 2014. The model was just as strong predicting this new, out-of-sample data, which is strong evidence for its reliability.

The data and R code can be found at LassoData.csv and LassoCode.csv and hatecrimepredictors.csv.

We are writing a full paper on these results. We are also examining to what extent prejudiced searches towards other groups can predict hate crimes against those groups.

Response to Obama’s Speech

Searches During Obama’s Speech.csv includes data for the minute-by-minute search response to Obama’s speech.

AthletesTerroristsSoldiers.csv includes the hourly data on searches for “Muslim terrorists,” “Muslim athletes,” and “Muslim soldiers.” It was the data used to make the accompanying graphic.

Response to San Bernardino

MuslimsBelieveKill.csv shows the paths of a likely-hateful search (“Muslims kill”) and a likely-informative search (“What do Muslims believe?”) after the San Bernardino attacks. Both rise, but Muslims kill rises far more.

We also have data on a large set of such searches, that are available upon request.

Political Responses to Terror Attacks

SyrianRefugees.xlsx includes data on daily search volumes for several common positive and negative searches about Syrian refugees from September 7, 2015 to December 2, 2015.

CloseMosques.xlsx includes hourly data on all searches including the word “mosques” and several common searches that suggest support for closing mosques.

Monday, November 23, 2015

Going to Oxford

I'm overjoyed to announce that I have been selected as a 2016 Rhodes Scholar. I'm deeply grateful to the selection committee and am excited for this wonderful opportunity to study at Oxford. More details to come.

Monday, October 26, 2015

The Swiss Shock: A Case Study

In January 2015, the Swiss central bank removed its floor on the exchange rate between the Swiss franc and the euro, allowing its currency to appreciate without limit. The immediate effect was a 20-percent increase in the value of the Swiss franc relative to the euro -- one of the largest revaluations of a developed-world currency in recent history.

The move sent tremors through the financial markets, which had been using the Swiss franc as a funding currency for carry trades and Swiss banks as a haven from the chaos of the Eurozone. Swiss exporters and the tourism industry screamed that the central bank's move would render the country uncompetitive.

I've been fascinated by this move -- whose proximate cause, I have suggested, was the resumption of capital inflows after a two-year pause, and the Swiss central bank's latent unwillingness to sterilize further inflows -- and so I've been waiting to do some post-mortem work.

What are the effects of changes in exchange rates on the macroeconomy? Switzerland provides a beautiful, clean case study. The revaluation was unanticipated before it happened and huge when it did.

To do my analysis, I'll use the synthetic control method that has been pioneered by Alberto Abadie at Harvard. You can read about that method here, but the basic intuition is that you can construct a comparison for the treated unit (in our case, Switzerland) by taking the weighted average of untreated units, where the weights are optimized so that the "synthetic Switzerland" matches actual Switzerland before the treatment as closely as possible.

The two macro variables I want to look at are stock prices and consumer prices. What I find are that the revaluation has reduced consumer prices by 1.5 percentage points but had no significant real effect on Swiss stocks.

I construct synthetic Switzerlands for consumer prices and stock returns separately. For consumer prices, the algorithm says that synthetic Switzerland is a mix of nine European countries, but is mostly a mix of Slovakia, Sweden, Netherlands, and Denmark. I use monthly data from 2004 to 2014 to do the matching. I found it interesting that it picked smaller countries, many of which have their own currencies.

What we find is that, in both actual and synthetic Switzerland, prices were flat prior to the exchange-rate shock. Such is Europe in 2014. Then, starting in January 2015, we see actual Swiss prices begin to diverge from consumer prices in synthetic Switzerland. As of September 2015, Swiss prices are now 1.5 percentage points lower than they would have been absent the revaluation.

I use data from the iShares MSCI indexes for stock returns, and I find that synthetic Switzerland is mostly a mix of Netherlands, Belgium, Sweden, and the United Kingdom. (Worth noting: On a totally different dataset, the algorithm picks roughly the same states.) Turns out we can predict daily stock returns in Switzerland quite well, as this scatterplot of actual versus synthetic Switzerland shows.

But I'm not finding any significant effect on Swiss stocks. Here are the cumulative returns for actual versus synthetic Switzerland from January 2014 to present, and any effect should appear starting in January 2015.

Perhaps this makes American investors less concerned about the effect of the appreciating U.S. dollar on their portfolios. The implication of these findings is that any nominal decline in stock prices is offset by currency appreciation.

I'll try to look next Swiss unemployment, their trade balance, and other real macroeconomic variables.

Monday, September 28, 2015

Boom and Bust and Biotech

Biotechnology and pharmaceuticals stock prices have declined about 20 percent in the last week, wiping out hundreds of billions of dollars in market capitalization. That drop is in the wake of popular outrage at the headline-grabbing Martin Shkreli, whose firm acquired the antimalarial drug Daraprim and had planned to raise its price 50-fold, as well as rumblings of a substantive public-policy response to pharmaceutical prices from the Hillary Clinton campaign.

It seems to me this market reaction raises two possibilities.

First, is this decline just the "inevitable" correction (in the sense of Blanchard and Watson's rational bubbles) for five years of strong performance from biotech stocks?

Blanchard and Watson proposed an idea of bubbles in which an asset price rises faster than other asset prices most of the time but has a small chance of falling catastrophically. As crazy as that sounds, it works fine in expected-value terms, relative to other investments, because the two cancel each other out.

There is some evidence for this proposition in the stylized fact that it's been the riskiest, best-performing biotech/pharma stocks which have corrected most sharply. For example, just compare the new-school Valeant, Celgene, Regeneron, and Biogen with the old-school Pfizer, Merck, Novartis, and Eli Lilly.

Alternatively, let's suppose that some of the decline reflects actual changes in the expected future profits of biotech. Shouldn't it be disturbing that a rough draft of a policy proposal to restrict drug prices caused biotech to implode? What does that say about the social value of these biotech innovations?

Not good things, I think. To the extent that price regulations hit firms differentially, they will hit firms most dependent on the high-price business model that regulators find objectionable. The market has just told us that new-school biotech is built around this model.

Intuitively, a good product does not depend on which way the regulatory winds blow. A lot of the new-school biotech firms just proved they absolutely do. That should concern anyone who is hopeful about the future of health.