One of the more bizarre things to pop up in the past couple of days is the idea that data revisions by a statistical agency is a “mistake” and thus cause to fire the head of a statistical agency. No, that is not how data revisions work.
Why Revise?
Not all economic data are revised. The most important non-revised data are those in consumer price index reports: prices during the month are measured, and then the monthly price indices are calculated in one shot. (There is the chance of an error causing a revision.) Since the CPI is used in contracts, a lack of revisions is important. Surveys with random sampling also tend to not be revised, since data gathering is a one-shot operation.
Other data are revised because the data used are collected with varying delays. In that case, the data are released with a schedule of revisions, with each revision adding new data to those already collected.
Two key examples are the American Nonfarm Payrolls report (NFP) and GDP.
The Nonfarm payrolls report is based on “a monthly survey of approximately 121,000 businesses and government agencies representing approximately 631,000 worksites throughout the United States” which is a large sample size, which appears to be a good thing. The problem is that not all data arrives at the same time, and it is not a truly random sample: it hits known workplaces, but it takes time for new workplaces to start reporting. This means that the estimated employment number for a month changes as the data roll in. If there are a lot of new firms starting up, job creation is only measured with a lag. This created a large upward revision to employment in 1994, helping crater the bond market. In response, the Bureau of Labor Statistics (BLS) created the birth/death model — a favourite of economics statistics conspiracy theorists — to try to estimate this effect, which is added to the earlier reported revisions.
Gross Domestic Product represents all production in the domestic economy, and thus represent a major data gathering exercise. The reported GDP numbers are done in passes, with the fastest-arriving data being used directly, while models fill in for a lot of missing information. In a process than can take years, the missing bits are either filled in or corrected as other data are made available. Meanwhile, calculation methodologies change, and it is possible for episodes of real GDP declines to be replaced by increases years after the fact (making recession-dating based on declines in real GDP impractical).
Economic series that rely on population estimates will face revisions as a new census rolls out, which can create large revisions that occur on the census-taking interval. (For example, the household report gives employment/unemployment levels based on the rates determined by a random survey, extrapolated to the whole population. If the total population is revised due to a new census, the levels change — although the raw percentage data remains unchanged.)
Are These Changes Mistakes?
The fact that the estimates for these numbers changes as new data come in should not be a surprise to anybody. The statistical agencies are not in the forecasting business, they are saying: “Here is the best estimate we have for an economic variable based on the data we have available. More data will be available later, and the estimate will most likely change.”
The sensible way to interpret this is to treat each revision of a data series as an separate economic time series. (E.g., the time series of the first revision, the second revision, etc.) However, most people find that cumbersome, and standard practice is to mix apples and oranges and splice together the latest available revision and just look at the combined series (which means that recent data are not strictly comparable to the back history).
If you want the “absolute best” estimate of an economic variable, wait for the final revision (although that is not really possible for GDP due to methodology revisions). Waiting years for employment data is not useful for real-time economic cycle analysis, and so is really only an option for economic historians.
The belief that revisions should not happen is a belief that we can guarantee that a subset of data will always correctly “predict” the values of the rest of the data. If that were true, why collect the rest of the data?
(As an aside to the “are these changes mistakes?” point, one can have good faith arguments about the methods used to come up extrapolations from early revision data. However, it is extremely difficult for outsiders to discuss this topic in an intelligent fashion, since the raw data typically cannot be released to the public.)
How to Deal with This?
Credible economic analysts know about the revision process, and work around it in their analysis. If they are data mavens, they can attempt to predict which way revisions will go based on other data (or vibes). They know that they cannot wait for the final revisions if they want to talk about what is happening in the economy right now. If you know about the nuts and bolts of the data gathering process, it is entirely sensible to come up with arguments as to why the initial releases of data are “wrong” (that is, poor estimates of the final revision). However, you should at least have an inkling as to why the initial versions are off.
Based on the recent comments that surfaced, it is clear that many outsiders do not really grasp why data are revised (yet they have strong opinions on the topic). My guess is that this is partially the result of there not being many similar procedures in the private sector, so there is little experience with the logic behind the process. One of the few areas where revisions come up is the area of equity analysis. Although people track revisions of equity analysts’ earnings estimates, those earnings estimates are not “hard” earnings data, rather “soft” analyst guess data. At least some firms try to estimate earnings/sales based on interim hard data, but those internal estimates generally are not put out into the public domain, rather firms wait for the official accounting reports.
No comments:
Post a Comment
Note: Posts are manually moderated, with a varying delay. Some disappear.
The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.
Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.