Bond Economics: Empirical Testing Of Macro Models

Monday, October 19, 2020

Empirical Testing Of Macro Models

From a bond market practitioner's perspective, model testing is straightforward: does it make money? The ability to make money after model implementation (and not just "out of sample") is a simple quantitative metric -- although one might need to wait for a large enough sample to test this. Not everyone is a market participant, and they want to evaluate models on other metrics (e.g., does it help guide policy decisions?). However, the key insight of the "does is make money?" metric is that it is related to the more vague: "does it offer useful information about the future?" It is entirely possible for a model to have some statistical properties that are seen as "good" -- yet offer no useful information about the future.

Usefulness in Forecasting and Pricing

Although one might be able to make money from a mathematical model any number of ways, I am considering two types of financial models that are of interest.

Forecasting models that generate buy/sell signals.
Pricing models. Although this sounds unusual in a macroeconomics context, this is related to DSGE models, given their similarity to arbitrage-free pricing models (link to previous discussion).

Forecasting models are probably what most people would think of, the structure of DSGE models implies a need to worrying about pricing concepts. The key observation one can make about financial forecasting models is that they are not evaluated based on statistical tests (r-squared, whatever), rather the profits they generate after model creation. That is, passing statistical tests does not translate into a useful model.

Since there is less to say about pricing models, I will discuss them first.

Pricing Models

For a pricing model, the question is whether we can calibrate them to available data to generate a probability distribution that is internally consistent. Being arbitrage-free is the usual criterion, but the test is more complex. In fixed income, one needs to be able to price benchmark instruments (e.g., standard spot swaps, volatility cube for vanilla swaptions) as well as more complex instruments in an arbitrage-free manner. (This is tricky because the pricing of benchmark instruments is not enough to pin down the distribution, further assumptions about behaviour are needed, such as how to smooth forward rates.)

Although DSGE models resemble fixed income pricing models -- they are projecting forward probability distributions -- the issue is that there is no observable curve for many key variables (mainly wages, although one can also question the relationship between breakeven inflation and the model inflation). There is no forward market in wages and physical consumer goods baskets -- so what are we going to arbitrage?

Where things might get interesting in this context is if we used economic arguments to enforce a relationship between nominal forward rates and forward breakeven inflation rates. (This would likely be based on assumed properties of real rates.) Affine curve structures for inflation-linked bonds already are models on this forward structure. Could one create models that imply "economic arbitrage" between the markets? Since I have not run out to set up a hedge fund based on this concept, it is clear that I am skeptical. (One of the stumbling blocks is the illiquidity of inflation options.)

Otherwise, we have to not worry about the complexity of nonlinear DSGE model solutions, and just use their outputs as a forecasting model. This is presumably how most people will look at them. We turn to forecasting models next.

Judging Forecast Models: How?

Using profitability as a measure of success of models has the advantage of simplicity. For most macro model users, there is no obvious replacement. However, I think the metric of success has to be related to how you intend to use the model, and what forms of errors you are concerned about. Standard statistical tests may not take these concerns into account.

For example, a model that does a good job of tracking real GDP growth during an expansion, but cannot predict a recession, is obviously useless as a recession forecasting tool, even if the average (in some sense) forecasting error is lower than competing models.

Another concern is the time frame for forecasts. Some models may only offer a forecast for an upcoming time period. Such a short forecast horizon can be of use if it is asset prices, but it is unclear how useful it is as a macroeconomics tool -- other than for forecasters who have to submit results to surveys. However, such a model may be the result of an econometric exercise, such as fitting a linearisation of a DSGE model.

If we have a model that generates forward trajectories, we presumably want to compare them to realised data. The issue is that the forecast covers multiple time periods, and so we are no longer tracking data that has a single value at a given time point (which is the case for a buy/sell signal in a trading model). There seem to be a few broad approaches one could take to develop a metric.

Fix a forecast horizon (e.g., six months), and compare the error between forecast and realised.
Generate an error measure for the forward forecast (up to some horizon limit) versus realised. This leaves open how to weigh forecast errors.
If we are looking for particular events (e.g., recessions), compare the realised events versus the forecast.

My concern with just looking at an average error may not be meaningful. During an expansion, economic variables tend to follow smooth trajectories. If a model always generates smooth forecast trajectories, it might be able to get reasonably low average errors during the expansion. However, it might fail miserably during the short periods of economic turbulence around recessions.

Nonlinear DSGE Model Evaluation

I will just make a few point form comments, since I expect to re-visit this topic.

I was quite unimpressed by the methods used to evaluate the empirical effectiveness of nonlinear DSGE models.
I am not considering large-scale models ("Frankenmodels") that are used by central banks to generate forecasts. Those models are a mixture of theoretically incompatible components, and forecasts are being generated by teams of analysts. It is unclear how many ad hoc adjustments are being made on the fly to get the "right" numbers.
It is unclear to me how a nonlinear DSGE model is supposed to be recalibrated to generate a probability distribution that evolves from period-to-period, most analyses I have seen are of analysing shocks (used in "all else equal" analysis). It seems to me that the model would need to be calibrated against observed yield curves to have the model match current cyclical conditions. I have not seen research that has attempted such a fitting.
The structure of the models implies that the forward state trajectory will smoothly converge to a steady state (as happens in forward curves). Just eyeballing historical data tells us that this might be a good fit during an expansion, but fail miserably around recessions. Is this the behaviour we want in a forecasting model?

3 comments:

DerekOctober 20, 2020 at 3:26 AM
Okay, so I've been trying to dig through how exactly DSGE models are constructed, but before we get there, I'm trying to recall everything from my computer science "theory of computation" class, I took years ago, as well as similar courses. I recall specifically, the halting problem(essentially, that you can't determine, in the general case, whether a program will terminate without actually running it), as well as reductions, where you reduce something like 3SAT to another NP complete problem, etc.

The point of reductions, If I recall correctly, was not necessarily to solve a problem by mapping it to another domain, although that is possible, but rather to demonstrate, that it is "at least as hard" as another problem, so that you don't expect an overly simplistic answer, as any polynomial solution to an NP problem would demonstrate P=NP. It seems like economists

Also, I do recall from my differential equations class that "most" diff eq's lack a closed form solution, about halfway through the course I quickly realized they were giving us essentially all the easy problems.

On top of all that, chaos theory dictates unpredictability even for deterministic systems based on a high degree of sensitivity to initial conditions, so any measurement error means that long run outcomes are not really

Also, from diff eq, I recall certain problems like the "3 body problem" not having any kind of stable solution.

As for some more approachable "simulations", I have played around with cellular automata a great deal, and while conway's game of life may be the most famous, wolfram's 8-bit rules are very simple and very instructive, of the great diversity of structure, that is possible even under simple rules.

On top of all that, I would expect society, and not only individuals within society, to function as a "learning system" such that groups and people adapt on the fly to solve problems in new and innovative ways all the time. This perspective would mean that we should not limit our agent behavior to a stable set of rules.

Even still, agent simulations, could be instructive for understanding a broad range of possible emergent behaviors, even if it was not compelling as a realistic simulation. Just like one can learn a lot by observing all the possible interactions in Wolfram's 255 cellular automata rules(google rule 30, rule 110, etc), one could gain insights on possible economic phenomenon by observing

But this whole notion of performing "statistical fitting" on such models, seems misplaced. The whole purpose of "deep neural networks", for example, is to learn complex relational structure, which is difficult to model statistically, and really does not lend its self to statistical analysis.

This whole endeavor seems deeply flawed. One of the most insightful parts of our "theory of computation" class, was it showed us all sorts of unproductive problems we could try to solve, and gave us a taste at the sheer mathematical difficulty of some very hard problems(NP computational classes and beyond). For example, in a compiler, you don't want to try to bake in too much static analysis, because you could easily inadvertently end up trying to solve the "halting problem", in which case your solution would either be likely to run a long time or be outright incorrect.

I just don't really expect that economists trying to get mileage out of DSGE's have much sophisticated understanding of theory of computation, tractability and stability issues in dynamical systems, etc. I don't really see the point. Computer scientists don't try to simulate computer systems for a very good reason, they run them and they debug them, at most creating sandboxed tests based on the real thing.
ReplyDelete
Replies

Add comment

Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.

Bond Economics

Pages

Recent Posts

Monday, October 19, 2020

Empirical Testing Of Macro Models

Usefulness in Forecasting and Pricing

Pricing Models

Judging Forecast Models: How?

Nonlinear DSGE Model Evaluation

3 comments:

Navigation

Pages

Recent Posts

Monday, October 19, 2020

Empirical Testing Of Macro Models

Usefulness in Forecasting and Pricing

Pricing Models

Judging Forecast Models: How?

Nonlinear DSGE Model Evaluation

3 comments:

Subscribe To

Navigation