-----
Interview with RiversHedge
INT: Is this thing you just did really reinforcement learning?
RH: Um, no. Maybe? Idk. It's code that "does stuff." I tried to make something that evaluated what I call fuzzed-out actions based on a value function and then adjusted a policy over many training iterations. It's probably not "real" reinforcement learning...yet. Might be a more general "machine learning" thing. Either way I hope it's an amateur hobbyist baby step in the right direction.
INT: Does the output have any functional purpose?
RH. No, not really. I was just trying to make a program teach itself something. I was using a domain I happen to know as a proximal reason to do it. If it works it is merely confirmatory and only mildly interesting.
INT: Can we trust the outcomes?
RH: No. This has too many short cuts, lacunae and elisions...and even some innumeracy, I think. The training boundaries are too narrow and the training deficient. The parameters are limited. I only have a desktop so I am bound by processing speed. Also I took some liberties with suppressing direct recursion in training which probably doesn't help. I also have only used log utility for risk aversion. The mini-sims are too short and thus erratic. All of these missing pieces were a way to keep it simple and lean "just so it works" and doesn't take forever to build or run.
INT: So, tell me again why this project?
RH: Idk. I just wanted to check it out and prove to myself I'm not mentally dead yet. Anyway, it seems like the direction of things these days. If I was 25 I'd be working on this kind of thing with more fervor and purpose. Perhaps I just wanted to see if I could still play the game. Plus it impresses my kids and I need their good favor when I run out of money in a few years. Also, I read a couple articles on this, in particular from G. Irlam [see references]. I got motivated by that content to see if I could wing something simple.
INT: Any other sandbagging you want to get off your chest since you seem so focused on that?
RH: Sure. With so many kluges to fix things in the code that I didn't try hard enough to understand (and I was in a hurry) and my ignorance of the machine learning topic, I'm not sure if the training will ever stabilize even if I ran it a bazillion times. Not sure how much effort and resources I want to put in to find out, though. I do have other interests, you know.
INT: Well, good luck with that, then.
Intro to the Project
My machine learning project started after reading a couple papers on the topic (see references at the bottom), none of which I really, truly understood except in a superficial way. I was motivated to take a shot as a personal challenge goal to see if I could do a baby version of this machine learning (ML) thing with what I know in Ret-fin. I don't really have the tools and skills to do it the way the big boys do, since I only know R-script and Excel and my only platforms are an HP desktop and my iPhone. But this is an amateur blog and we try stuff just for the hell of it. Whether it is "real" ML or not is TBD. My goal was not to win any awards it was more of a late-age challenge goal just to see if I could do it at all.
For my project I wanted to keep it very very very simple and doable. To explain what I was trying to do, let's start with a schematic, the center of which I lifted from a Kolm & Ritter [2019] paper. This will be the edifice off which I hang the explanation of what I was (am) trying to do...so far. Things might change.
Schematic
Figure 1. The Reinforcement Learning (RL) process [Note: I've revised this image since first posted] |
The Functional Goal
While I was more focused on the machine as a machine, the machine needs a purpose. Since I have come to understand via my blogging that spending is a "strong force" in retirement and asset allocation is relatively weak one (except maybe at the edges), and since I have no control over longevity or markets and I have no annuitized income and I have already set my retirement date (to 10 years ago), I have a strong interest in the spending topic. So, the question for me became "hmmm, can I get a machine, given a few inputs and some room to explore, to teach itself what it would (should) spend at given ages for given wealth levels?" That's an interesting question of course but keep in mind as we go I am less concerned with whether my answer is functionally great than whether I can get a program to do some adaptive processing. The spending question -- once described as the nastiest problem in economics -- is an area I have studied before and where I had some preexisting tools and knowledge. This topic area makes it easier for me than if I were to dive into, say, handwriting recognition where I'd flail for years and years. Took me three days to rough it out. My daughter said "wow Dad, three days is fast for something like that." I reminded her that it wasn't really three days it was six years of prep.
The Learning Process in Motion
1. I start with a "setup:"
- $1M endowment
- Age = 60
- Longevity expectations are probabilistic [1]
- Return expectations are generic (and stochastic) and consistent with a blended allocation [2]
- The initial spend is 40k, real... but randomized at each iteration to be between 30 and 50k.
- Risk aversion is low (ra=1). I did this so I could use log utility in the mini-sim to simplify things.
2. I walk through a simple sim-life (iteration) year by year and then repeat a bunch of times
This simple meta-sim is the "agent" in Figure 1. It follows a standard net wealth process (state change in the figure) over a multi-year lifetime with a probabilistic horizon where returns are earned, resources are consumed and then the chips fall where they may over time. Spending can crash, of course, when wealth depletes but then I also keep a minimal subsistence floor to keep myself away from the pernicious effects of zero consumption in R-script and the utility math. The philosophy here is that no one really consumes at zero -- jobs are sought; family steps in; institutions of government, associations, or religious affinity are theoretically available. Spending will be > 0 one way or another, even if not by much.
2A. I diffuse the spend at each age step for discovery
At each life start, spending is uniformly randomized over a 3-5% interval. For each age after 60, the then-current spending rate is randomly diffused to be +/- 1% of the then-current spend. This is to allow the program to "explore the territory." This is the "action" in the figure 1.
2B. I evaluate the action in 2A via a value function and then decide what to do
Given any "then current" environment (age, wealth, the then-operating conditional survival probability for that age, the spend rate "action" offered from the diffusion, and the spend rate carried from the prior period), the alternative spend is tested with a mini-simulation that calculates the "expected discounted utility of lifetime consumption" at that age+1 with that age's+1 conditional survival probability. The current period utility is then added to that and the combined score is compared to what was determined the last period for age+1 from last year and that spend (so the intervals match). If there is a "reward" for that particular action then the policy weight for age+wealth+spend combo is adjusted and then the new spend rate is put into play...otherwise the original spend moves forward. A more complex version of the value function and process minus RL is described here.
My best guess for the notation is as follows in Eq1 where the right side of the sum can be ignored and the middle term of the left is moot for now, g is the utility function of consumption (CRRA for ra>1 and log(c) for ra=1) and omega is a standin for the conditional survival probability. I am willing to be corrected if I am off here. This also does not capture the way I split the function into "current and forward" but it does give a flavor.
Eq1. Machine Value function
S=iterations of mini-sim
g=utility of consumption func
Omega=survival prob. weight
alpha=subjective discount
c=consumption
|
The agent makes sure that a next period "advantage baseline" (results from the mini sim value calc) is available for the next period comparison no matter what spend rate is used, old or new.
3. The state changes; the net wealth process evolves.
This is a way to say we then randomize returns and record the change to wealth that comes from the random returns returns and in-period spending. That means net wealth adjusts recursively[3] over time. Note that the "effective" spend rate will almost always be different in each period due either to agent-action or natural changes in wealth.
Then the new state, or environment, is returned to the agent for the next action cycle. That means that in theory the policy should evolve over many iterations as it learns what works better. I am reluctant to admit that I have not made the link as direct as possible between the policy and the action. That means that the reinforcing recursive effect is mostly but not entirely missing which is why I hedged in my interview above about reinforcement learning. Idk. I think in another version of the code I can do it and it may mean that this current exercise is less of a machine learning thing and more of "it's just a program" or alternatively: less of a reinforcement thing and more of a machine thing.
A Note on Continuous Processes
The astute and experienced in process methodology in manufacturing will recognize the process in the schematic as a Deming cycle -- plan, do, check, act -- or something akin to a six-sigma program...except that the agent here is not a factory tech or operations researcher. It's a program. Similar idea, though. Worth mentioning.
Some Too-Preliminary Results.
It takes a ton of training to make this work. For example, when I read Mr Irlam's paper, the scope of his training was in the many billions of sim-years. His program ran for days on a professional platform spread across many CPUs. Me? I ran mine (so far) for a combined total of maybe 9-10 hours and 500k sim years on my home office PC. The results are erratic. I'm not sure how far I want to take this project so this post is pretty much like just me jumping in at the first mile of a marathon to ask whether someone is going to win. Premature. So, it's a wee bit early but worthy of discussion nonetheless.
Caveat -- In the results below, note that I am not describing a net wealth progression when I chart spend rates against ages. The results show, rather, the "policy" spend rates from the machine (or benchmarks) at each age for, in this case, W=$1M. So if I had $1M at some age what would I spend given remaining life? In real life, if I started at 60 with 1M, my wealth might be either $0 or $10M by age 75 or 80 so my actual spend rate at 75 or 80? Well, then it depends. I needed a way to show the policy though and I've used this method of display before. Also note, if I haven't mentioned it, that this only reflects very low levels of risk aversion. Higher would have different implications, as would the presence of life-income.
Benchmarks and Chart Key
For comparison purposes I have assembled some other methods for playing the same game when it comes to charting age-adjusted spend recommendations. Because of the differences in how these different methods are used, the exact parameters are not precisely apples-to-apples so my project probably wouldn't withstand academic scrutiny. But that is not really my goal. The goal, rather, is for me to step back, squint my eyes, and say "hmmm, are these things even playing the same general game at all?"
In the ring are the following:
1. Merton Optimum math. This is tuned to a low risk aversion (ra=1) and the horizon is redefined at each age based on the mean SOA (healthy cohort) longevity by attained-age. Returns .04r/.12sd. There is a wiki page that explains the Merton Optimum math for the fearless.
Merton is Light Blue
2. RMD-style calc. This is not the IRS RMD table you might have seen but it is similar in that I am using the SOA annuitant table results that give a "mean years left" value for each age to determine the horizon[4]. i.e., the spend rate = 1/horizon in years. I read somewhere that RMD calcs are good proxy rules of thumb that approach other optimization schemes. They've worked well in past modeling for me.
RMD is Dark Blue
3. Kolmogorov PDE (partial differential equation) for lifetime probability of ruin. This is described in Milevsky's "7 Equations" book. It's a 1930s-era variant of the heat equation that can, via a finite differences approach, be used to solve for or evaluate the ruin rate. I have a spreadsheet with the FD approach I got from Prof Milevsky. I also re-wrote it in R just for fun. If I "fix" a constant ruin risk at each age to, say, 20%, I can then solve for the spend rate that gets us there at each age if we adjust our assumptions along the way. Fwiw, in simulation mode, the Kolmogorov PDE is equivalent to, or can be satisfied by, a simulation process where the probability distribution for portfolio longevity in years is weighted by a conditional survival probability. The intuition on that approach is less rigorous than the PDE but produces a similar outcome and is more adaptable. Easier to understand, too. I built a tool for that called FRET (flexible ruin estimation tool) because fret is what we do when we think about this stuff too much.
Kolmogorov Lifetime Probability of Ruin (rendered as spend rate here) is Orange
4. PMT method. In Excel, the PMT function can be used to calculate the pmt amount over a given horizon (e.g., years...or an estimate for remaining life at each age in our case), usually for loans and other obligations. PMT is, if I understand it correctly, an inverted form of the 13th Century Fibonacci formula for portfolio longevity. A number of pension academics have promoted PMT as a reasonable tool for age adjusted and state-adjusted re-calculations of spending over time. Here I am thinking mostly of Waring and Siegel's annually calculated virtual annuity (ARVA) [references].
PMT method is Green
5. RH40. This was a custom rule-of-thumb I made up using Evan Inglis' Divide-by-20 rule as a base, something I adjusted to reflect the effect of age and changing longevity probability. It was just for fun but it works quite well and holds up well with other methods in past research by me. I threw it in here because this is my blog. Note that RH40 is by my intent very conservative. The formula is: withdrawal = Age/(40-Age/3)+n where n is an adjustment for risk takers.
RH40 is Grey
The Machine
The results of what the machine taught itself to spend, and the values we are comparing to the benchmarks, are based on a small set of training runs, approx 25000 iterations and 500k sim years, as it was described above [after this post I later upped that to 57000 iterations and 1.1M sim years]. The current output I have (again, we're only looking at the $1M wealth level) is overlaid over the others in red. Because of the erratic results at this point I've added a trend line (excel exponential trend) in a dashed red line to see how my results, if I'm lucky, might play out with infinite iterations in the machine.
Also note that there was a minor "cheat" at this point, depending on how you look at it. The internal mini-sim is run very very briefly to keep processing time down. That means there is a slight sampling problem and there is wide dispersion in the reward values for any given iteration and sim year. The policy is very jumpy. I have been running the machine in bursts of an hour here and an hour there for a cumulative 9-10 hours (that's short). So, since I was starting and stopping, I kept "generations" of the policy as it evolved...for restart and recovery. Each generation (5 of them so far [later up to 16]) was pretty idiosyncratic and shows poorly individually due to the erratic results. But, for the purposes of this post (I am confessing to a little data massaging here...I am corrupt!) I went back and averaged the results across 5 generations -- via a weighted average that biased a bit towards the later generations, -- at each age (this is for $1M only, btw) to smooth the line a bit and to allow me to make my point. Not sure that is legit. I think I am averaging averages here, hence likely I am flirting with averaging paradoxes...eh, whatever.
The Machine is Red. A trend line for Machine is in red dashes.
The Chart
See the above narrative for the Color Key
Figure 2 |
(W = 1M only)
[note: this (red) smooths out a little bit in the subsequent posts listed below]
Discussion
For the purposes of my late-age finance blog I am going to declare a minor victory here relative to my modest goals when I set out. I count it like this:
1. It worked. It looks like my personal challenge goal of getting a snippet of code to teach itself something worked at least a little bit.
2. It "escaped." The program was seeded with a spend rate of 4% (give or take a percent). But it then figured out on its own, by exploring territory and evaluating the results, that it could "jump the fence" that I had set for it...and then come to its own conclusions. Speaking of which, have you ever read any books on AI? I highly recommend Sanction (novel) by Roman McClay for some intuition on how AI really plays out...
3. It is playing the same game as the other players. It appears that the policy coming out of the machine is starting to line up with the well known benchmarks that I chose. Of course, you don't know if I stacked the benchmark deck...but I tried not to; you'll have to trust me. My own take-away is that the machine policy appears to be "playing the same game" as the other tools that I've used in the past and that I trust. I hadn't specifically planned on that happening so it was gratifying.
4. I learned something new. While the functional applicability of the project is more or less irrelevant at this point, I learned something new and the method might be useful to me in the future at least for research if for no other reason. Another tool in the tool belt, if you will. I'll need a bigger computer, though.
I doubt I'll run this project to ground too much further at this point [then I did] since I achieved my goals and there are no big payoffs beyond here that I can see. But TBD. We'll see. For the future, something I did not touch this go-round -- the recommendations for spending at different wealth levels (other than $1M) at different ages -- was a result that was interesting and a bit counter-intuitive (smaller spend rates for higher wealth) vs what I might have expected. That is probably worth some thought. I have a theory on that.
----- Subsequent UPDATES to this Post ---------------------------------------
- An interesting side effect in my machine-learning project
- Goosing the reinforcement element in my machine-learning toy
- Update on my Reinforcement Learning Experiment
- Trying to Increase the learning speed of my naive RL machine
- A peek into the learning process of my machine
- My machine digesting some higher risk aversion
- Machine v Merton at RA=2
----- Notes ------------------------------------------------------
[1] I used for the mini-sim a Gompertz model with mode 88/diffusion 9. This is maybe 1/2 way between an average expectation using an SSA life table and a healthy expectations using an SOA annuitant table. Gompertz for simple purposes can be expressed like this where x is age at eval, t is future year, m is modal zone of the distribution, b is dispersion of that distribution, and p is the (conditional on x) survival probability:
[2] I've been tending to use .04 real return and .12 standard deviation lately. That's roughly like a 60/40 allocation maybe. The distribution is normal which I know is not real but I had no desire to engineer anything more complex here.
[3] for the interested, a net wealth process can be described like this though in the code it's just: wealth(t+1) = wealth(t)*returns(t) - spending(t)
Net wealth process, B is the Brownian motion term |
[4] I have a bunch of versions of the SOA tables and can more or less manipulate them...I think, along with the model in note 1, but I use aacalc.com for this exercise.
----- References ------------------------------------------------------
Ingliss, E. The "Feel Free" Retirement Spending Strategy, SOA 2016?
Irlam, G. Financial Planning via Deep Reinforcement Learning AI, 2018 SSRN
Irlam, G. [topic: machine learning re financial planning], not published yet
Kolm & Ritter, Modern Perspectives on Reinforcement Learning in Finance. 2019 SSRN
Milevsky M, The 7 Most Important Equations for Your Retirement: The Fascinating People and Ideas Behind Planning Your Retirement Income. Wiley 2012
Waring, M.B., Siegel, L. The Only Spending Rule Article You'll Ever Need. 2014
No comments:
Post a Comment