Feb 10, 2020

My first kinda botched attempt at backward inducting spending via SDP

Preliminaries and Intro

The purpose for this post is to write up my attempt to try to use an "optimal control theory" technique (e.g., stochastic dynamic programming and backward induction - BI/SDP) to evaluate lifecycle  spending choice (or the decumulation half, anyway).  I had tried this BI/SDP technique once before with "asset allocation choice" when I tried a couple years ago, with a modestly successful outcome, to replicate Gordon Irlam's description of the method in his article Portfolio Size Matters [2014] article.

The goal here is not replication (I'm not sure I've actually ever seen this kind of BI thing done before for spending) nor is the goal necessarily usable functional results. No, I am mostly just trying to: 1) build new skills or stretch old ones, 2) see if I can do it at all, and 3) maybe provide another avenue of confirmation for the shape of spend rates in the mid-to-late age retirement process.   Since the method is considered to be quantitatively and intellectually robust in some circles of academic econ, it is probably therefore worthy in my mind of some examination. It can then be placed in the toolbox that I have for "triangulating" around my understanding of the retirement spending problem.


Disclaimer on my data and methodological corruption

I use the word corruption here not for humility's sake but for a couple reasons related to my skepticism on what I did, how I did it, and what I got at the end.  If I take my own work with a grain of salt, I expect you to do so as well.  Here are some reasons I am skeptical, which may or may not make more sense further down in this piece:

1. I have engaged in the worst practice of nudging my coding and parameter settings in order to get the result I wanted in the first place. That is textbook data science corruption. On the other hand, the goal was learning something new-ish so we might have to forgive that sin for the greater good of self-ed.

2. I have weighted the value function that underlies the whole thing -- and I have forced ammendations to the economic method -- in a way that is pretty much off-road from what I see in the academic lit. That will make me either wrong or naive or both.

3. The output is unstable in a way I don't yet understand. It's possible that the internal mini-sim is not long enough to get a good sample but TBD.

4. My technique in chaining backward induction year-to-year may be naive. I am assuming I am doing it well enough but I have no idea. I am borrowing the method from the Irlam article and what I did 2 years ago and then handing it to this project. ....and then hoping I got it right.

5. Because the method necessarily reduces to evaluating single periods working backwards, it turns out that I had to make some odd assumptions about the residual probability of me being alive outside "the period" in question. Those assumptions may or may not be warranted within the formal method itself.  But I also need to have my common sense work for me personally.  In a conclusion I describe later, I came to realize that I probably have to assume that in any period t, the marginal utility of bequest is almost always > than marginal utility of consumption (except the last year).  When it comes to bequest that also means I turned Yaari's "hump shaped subjective bequest weighting" (used in a full multi-period life-cyle model) into a more linear, or rather binary, model, i.e., it's either high (all years ex last) or low (in the last year). Who knows if that makes sense. TBD

6. All of this means that the "shape" of the result I got in Figure 2 may have more to do with my original intent and my biases and less to do with anything mathematically "necessary."  That kind of comment would ordinarily trouble me as it should you.


On Bellman Equations

I can't pretend to know this stuff but I can at least regurgitate what I read.  Here are some wiki excerpts on the relevant topic of BI/SDP for background:
A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman's “principle of optimality” prescribes.  ... [and then more specifically] ...
Richard Bellman showed that a dynamic optimization problem in discrete time can be stated in a recursive, step-by-step form known as backward induction by writing down the relationship between the value function in one period and the value function in the next period. The relationship between these two value functions is called the "Bellman equation". In this approach, the optimal policy in the last time period is specified in advance as a function of the state variable's value at that time, and the resulting optimal value of the objective function is thus expressed in terms of that value of the state variable. Next, the next-to-last period's optimization involves maximizing the sum of that period's period-specific objective function and the optimal value of the future objective function, giving that period's optimal policy contingent upon the value of the state variable as of the next-to-last period decision. This logic continues recursively back in time, until the first period decision rule is derived, as a function of the initial state variable value, by optimizing the sum of the first-period-specific objective function and the value of the second period's value function, which gives the value for all the future periods. Thus, each period's decision is made by explicitly acknowledging that all future decisions will be optimally made.

Basic method

The method, which I may or may not have naively corrupted a bit, follows the script in the quoted paragraph above.  It takes two distinct forms over a given horizon:

1. In the last year (of, say, 30): I evaluate -- for many wealth levels and different spend rates -- a value function for the utility of the joint outcome of spending and terminal wealth. This is done via a simulation (longish in this final year) for a fixed/mixed allocation along vectors of different wealth levels and different spend rates. The spend utility in this final year is weighted more than the bequest, on which more later.[1]  The optimal spend rate for each wealth level is then selected and remembered along with the utility score.

2. For the next to last year (and then recursively backwards for all other previous years) -- for each wealth level and spend rate -- a similar simulation is performed except shorter now for time's sake.  The terminal wealth in the simulated net wealth process at each wealth+spend pair (in t(29) now) corresponds to a score in the final year (t(30) and then later for t+1 years relative to each backward step) for a level of starting wealth in t(30). The output of the sim is combined and those new combined scores are remembered for each spend rate within wealth and then evaluated to pick the max U to determine the optimal spend rate at, say t(29). This is done though while also factoring in the evaluation for spend utility in t(29). This kind of twist might be what I had called "off-road" above, idk. Then the whole process is repeated backwards for each year until we get to the first. In the first we know that year 2 and beyond were all optimal choice. This is what makes the year one decision (by making use of the optimal choice in year t+1) tractable vs the near impossibility of having to do it in forward mode where there might be 630^30 combinations of  spend rates and years to evaluate.

Note that I also have forced wealth and consumption constraints.  Given the single period nature of the task, I end up, in any period t, with an distribution of terminal wealth values coming out of the net wealth process (W(t)*r - c(t)).  But I constrain wealth to being >= 0 at the end of t. That then means that consumption is constrained to be either the planned "c" or, if wealth were to hit the zero constraint, c is constrained to available wealth. There is no income in this model so, in the extreme, consumption would also theoretically have a floor of zero or at least some subsistence level.

Confronting the bequest concept

One of the interesting things about the project was that for the first time I had to directly confront the concept of bequest which I have not really done on this blog before. My hand was forced for reasons below.  Before we go any further, though, let's quote Yaari(1965) on bequest as well as quote Yaari quoting Fisher(1930):
"A consumer very likely values a bequest  not only according to its size but also according to the time at which it is made. To accommodate this possibility we introduce a subjective weighting for bequests..." B(T) [emphasis added] Y  
"...where B(T) is defined for all values of the random variable T. In most cases one would expect B to be  to be a hump-shaped function, because the importance of bequests is foremost when the consumer dies in his middle years." Y  
"...it is reasonable to say that, other things being equal, the consumer prefers leaving a larger estate to leaving a smaller one." Y
"Marshall gave particular emphasis to "family affections."" Y  
"Uncertainty of human life increases the rate of preference for present over future income for many people, although for those with loved dependents it may decrease impatience." Y quoting F [emphasis added] 
This is important because in a utility-evaluation context for this project, even though we have single periods we still have differential net wealth outcomes in each period that we test and which may or may not be zero at the end and also where consumption is more likely than not to be non-zero, at least in this model.  The result of this is that if I were to focus only on consumption utility within a period then the highest consumption will always win. If I focus only on terminal wealth utility, the lowest spend rate will always win.  Neither of these seem all that great outside of a single period decision, so apparently there needs to be a tradeoff between the current period and future periods if we are planning to be even vaguely aware of the other periods as opposed to hyper myopic.

We see something like this in Yaari's "Case B" where, while he is not doing single period backward induction, the utility function (21) is similar but not exactly the same as what I had to use. The left term on the right is a weighted consumption utility, the right side is a type of weighting to bequest or savings:

Figure 1


In sum, fore me, spending 2% every year in order to favor bequest doesn't make sense. Spending 60-100% each year in order to favor consumption utility doesn't make sense either. And then also the Yaari "hump shaped" weight that is "defined for all T" is something that is utterly unknown to me. I tried to wing it but I could not make a weighting scheme like I imagined it would be work by either hook or crook but otherwise the framing of the hump idea appeals to my intuition.   

The epiphany, if we can call it that (and not corruption as above), came when I realized that I was only working in single periods going backwards not a full multi-period lifecycle.  The issue of bequest then shifts in that world (single period links in a chain) from the full lifecycle model of "pre-death consumption over 30 years and then give to loved dependents thereafter" to the legatee of each period really being oneself after the "death" of each and every separate period...and let's ask: who, frankly, is a "loved dependent" if not one's self.  So I decided to do the Yaari "hump" except here as a binary: high for all periods except the last and then low but not zero in the last period.  It makes sense now that I'd want to not only bequest to myself while living (and expecting to live some more) but also to think about all my descendant-generational-selves as well. That idea gives "bequest" some strength and probably explains why it turned out that I had to weight it so heavily here. TBD tho.

So for this run, which I consider awfully simplistic if we are not totally off road, I modified Eq 21 in Figure 1 to be a relative-weighted version of itself across the two terms, especially since I don't know the "authorized" subjective weightings...or the economics for that matter. I did something roughly akin to this if I got the parens and notation even close:

E[U(c)] = 1/n Σ{ (((1-β(t))*g[c(t)]) + (β(t)*ϕ[S(t)] ) }   

Can't remember if I need a double sum here; I'll try to recall that later. Both functions (g and ϕ) are CRRA and so have positive 1st derivative and negative second.  ϕ is a function that draws in and combines the t+1 utility consistent with the Bellman discussion above but is otherwise unexplained here. If there's an economist in the house I would take redirection. Generally speaking, though, I only have my cats to guide me so for now it stands until I try again later.

Some parameters and assumptions

To set this up I had to make some assumptions. If I can remember what I did, this is the basic gist:

- 30 years going from back to front
- wealth from 100k to 3M in 100k increments
- spending from .40 to .70 in t(30) in .001 increments
- spending from .02 to .65 in t(29) to t(0) in .001 incr
- weight to bequest 20% in final year t(30)
- weight to bequest 98% over t(29) to t(0)
- risk aversion coefficient of 2
- standard CRRA utility function
- fixed allocation = 4% real return and 12% sd
- mini sim to test the net wealth process = 500 iterations
- as above there are wealth and consumption constraints
- what I display below is only for $1M for consistency

The output

After fiddling around with this over a few days of despair, I finally came up with figure 2. I wasn't happy that the data points are all over the place but the basic regressed shape (red) gives me some hope if I ever try it again.  It generally conforms -- if we ignore for a minute that I was trying to make it conform -- to the basic shape of the spending curves with which I am familiar.

The legend:

Grey dots are the output of the induction/dynamic programming
Red dashed line is one fit to the grey dots in R
Grey dashed is the Merton optimum tuned to a 90th percentile annuitant tbl | age 60
Black is my RH40 rule based on a construct from Evan Ingliss

Figure 2

Discussion

- Like I said, this is not very clean but at least it looks familiar if we squint. And oh, btw, there's the 4% rule in a way in year 1; whattya know?

- I did not show it but this also is fairly confirmatory re my machine learning experiment, too.

- I have a long way to go to make this work but I'm satisfied that I at least have something here.

- Not sure about the wide variance in rates over ages but that's for another day.

- Curve might conform even more if I tune the terminal year assumptions a bit...but not for this post.

- Quite consistent with my goals, I do not have a strong, usable, functional outcome to take with me on this but I did learn (again) how to chain optimal control stuff backwards. I also did stretch some coding skills. I got some general confirmation of past intuition along the way. And...it looks like I more or less pulled off a very rough version of BI/SDP for spend rates.  That's enough for now.




--- Notes --------------------------------

[1] This was a weird result and I have not taken the investigation of it too far but when I tried different weights for spend vs bequest utility in my jury-rigged, possibly wrong, equation this is what happened. As I raised the weight to the spend (we're in the last year btw) there was a limit of sorts around 60% spend. Then, at ~99-100% it had an odd pop. hmmm.  Looked like this below. This is why I picked for the last year a 20% weight to bequest not 0. I guess maybe one never knows if the last year is really the last. Maybe we probably shouldn't spend our last dime in any period. idk. TBD.

Figure 3. 


--- References --------------------------

Bellman, R. (1953)  "Bottleneck problems and dynamic programming" Proceedings of the National Academy of Sciences of the United States of America. 39.9 947 [didn't read this, This was G Irlam's reference in the paper cited below]

Fischer, I. (1930) The theory of Interest [haven't read, this was from Yaari's paper and I presume the source of the quotes.]

Irlam, G. (2014). Portfolio Size Matters. The Journal of Personal Finance. 13(2), 9-16.

Wikipedia, Bellman equation, https://en.wikipedia.org/wiki/Bellman_equation

Yaari, M. (1965), Uncertain Lifetime, Life Insurance, and the Theory of the Consumer


No comments:

Post a Comment