Opportunities for Empirical Research on Copyright

In contrast to patents and trade marks , text and other materials protected by copyright do not necessarily end up being registered. That, it would seem, will curtail the possibilities for research on the effects of copyright protection or on any other research questions you might have regarding copyright.

Nonetheless, there has been quite a bit of empirical research on copyright, most of it collected in CREATe’s wonderful Copyrightevidence Wiki. As you browse down that list it will become apparent that much of the work listed focuses on software and piracy in the digital realm.  One reason fo this is that data to analyse piracy can be obtained while data for many other questions regarding copyright protection is much harder tocome by.

Consider for instance the incentive effect of copyright term extension. Here the prospects for good empirical research would seem to be much worse, because these events lie in the past and they weren’t mediated by the internet. There are ways around this using information on airplay time for music for instance. But the opportunities are fewer.

The point of this blog post is to illustrate that Google’s ngram viewer is another source for data on the “cultural impact” of a work. The ngram viewer is a search engine that searches various corpora of digitized books for text strings. The resulting data can illustrate how frequently titles of books or songs appear in the corpus of a particular language at a given time, which provides a measure of the popularity/impact of that title. Just like the airplay measure the data provides information at the time of creation and after this.

Consider Figure 1 below which shows the frequency of references to two songs that were first published between 1912 and 1914 and the popular Happy Birthday, which arose somewhat earlier but was first published in books around 1911.

Figure 1: Three popular songs from the the 1910’s



The next graph displays the impact of titles of novels that were written by british authors and entered the top ten bestselling novels in the US before and after 1911, the year in which the copyright term was extended in the United Kingdom.

Figure 2: Four successful books published around 1911



Here only Septimus is from before 1911, the other three novels were published after this date. Of course these snapshots prove nothing, save that the data is there to be analysed. It should be possible to elicit whether more british born novelists succeeded in the United States before or after 1911, or whether the copyright extension had no effect at all. An interesting comparison would be with the period 20 years before when the costs of printing were falling dramatically, which increased demand for printed works significantly.

The main challenge in pursuing this type of analysis is in putting together lists of relevant titles and then querying the ngram viewer repeatedly to obtain the time series data. For a related effort using data from Google Trends it may be helpful to look at the program used. It should not be too hard to adapt it for the purposes of querying the ngram viewer.


