Recent Posts

Tuesday, May 27, 2014

Lessons from Piketty and Reinhart & Rogoff

People were quick to draw parallels between the data problems of Thomas Piketty and those faced by Reinhart and Rogoff. I think there are a few lessons that can be drawn from these episodes, even though the problems with Piketty’s data appear much less serious. Since the details of the wealth distribution is not a priority research topic for me, I will not comment on the details of Piketty's alleged errors. Instead my observations here are more about methodology.

The first amusing parallel is the role of Excel spreadsheets in these episodes. The second issue is more controversial, as I argue that both sets of authors have overstretched neoclassical modelling frameworks in an attempt to explain the operation of modern welfare states.

Spreadsheet Problems

These episodes confirm my long-held view that spreadsheets are an inherently bad means of doing serious analysis. Spreadsheets are fine for managerial tasks like playing with budget scenarios, simple calculations, and possibly data entry. (There was a transcription error in the Piketty spreadsheet, which shows that even for data entry, care needs to be taken.)

Spreadsheet formulae are largely hidden, cryptic and fragile. It is very difficult to tell if the spreadsheet has been broken by an inadvertent key press.

Using a database of some form to store the raw data, and a statistical package (I use R for this blog) to manipulate that data is definitely preferable. Advantages of avoiding spreadsheets are:

  • All manipulations are easily seen within the text of the code. How you are calculating averages, for example, is easily seen.
  • It is easy to see whether calculations are being done over the correct sample intervals.
  • Productivity is enhanced as common tasks can be moved to library code.
  • It is possible to enforce coding standards, such as requiring comments.
  • It is easy to share work amongst multiple analysts within a source control system.

The disadvantage of using statistical packages is that people who want to check your results need to know how to program. But as long as you know how to program, that's their problem, not yours.

Analytical Overreach

I also feel that in both cases, the use of neoclassical models were stretched to cover modern economies.

In the case of Reinhart and Rogoff, their analysis was econometric, and lacked a theoretical argument why high debt levels would slow growth. (Ricardian Equivalence presumably.) But by ignoring the reality that slow growth causes deficits in welfare states to increase, their analysis missed the obvious problems of causality. (Since slow growth mechanically causes high debt-to-GDP ratios, statistically testing whether high debt-to-GDP ratios “cause” slow GDP growth makes little sense.)

In the case of Piketty, his book was filled with a lot of details. But the core of theory revolved around some national accounting identities. This tight analytical focus made it possible to attempt to discuss centuries of data. But this leaves open questions about the institutional details of welfare states. For example, government-provided old age pensions reduce the need of the middle and lower classes to save for retirement, and so we should expect a higher level of wealth than income equality. And the post-1980 period saw a generational bull market in financial markets, which will mechanically raise inequality over that period. Given the market backdrop, I have some doubts that we would have enough data to conclude whether inequality will keep increasing, or rather the forces of dispersion (conspicuous consumption, multiple heirs, divorce) will counteract the cumulative effect of asset returns above income growth (“r>g”).

In conclusion, I am fairly skeptical about structural macroeconomic theories that use economic data over long time intervals. Developed economies after World War II act quite differently than in earlier periods, and the different countries followed the same fads in macroeconomic policy. Correspondingly, we end up with a effectively limited data set from which we can draw conclusions.

See also:

(c) Brian Romanchuk 2014


  1. I used to work at a central bank and did a lot of coding/data manipulation on the job. I observed a lot of mistakes with the handling of data by economists. Majority of them can't code and resort to spreadsheet models. I've become so used to seeing mistakes with the data manipulation process that I have become highly skeptical of most empirical papers.

  2. This comment has been removed by a blog administrator.


Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.