Jan 24, 2020

What happens when you try to improve the machine by suppressing outliers

Intro

The short answer to the title is that it looks like the machine's output shifts from finance to economics. That confused me at first but I think I have a bead on this. First we'll look at where we've been with: a) lower risk aversion (small, error prone sampling), and b) slightly higher risk aversion (again with smaller sampling. Then I'll change the sampling a bit to see what happens. Then finally I'll try to explain what I think I'm seeing.

What do I mean by sampling and outliers?

In the machine/model as it walks through the meta-sim -- where  "1 iteration = 1 life" and then "year by year within a life" -- it is, at each age for whatever wealth level and spend rate it is at, checking by way of a forward consumption utility simulation for an estimate of the lifetime consumption utility. It does this in order to compare a course of action (changing the spending) to a baseline (what it would have done notwithstanding the change). Since that is a heavy use of the processor and since I was just playing around I originally kept the iterations for that internal mini-sim low, say 100.  That is "the sample" and since it it is technically sampling from infinity, it is a laughably low sample.  In this post I increased that to 300 which is still laughably low but also painfully slow. On AWS with 4x4core so 16 CPUs it takes about 50 minutes for 1000 iterations of the meta-sim. I later nudged it down to 200 due to impatience but that didn't change the conclusions much. 

The main difference, an obvious statistical thing, is that the dispersion of the sampling distribution narrows a bit and the relative impact of outliers (of lifetime consumption utility) comes in. I'll try to interpret that later.


Where we've been, from before: smaller samples / shorter mini-sims

I think I put this chart out before.  This is the machine output relative to the low risk aversion (RA=1, higher on chart) benchmarks described in the original post that kicked off the project here, and the higher risk aversion benchmarks (RA=2 lower on chart in green) described in this post here.   I was comfortable at this point that the machine was more or less finding financial landmarks with which I was familiar.
Figure 1. Low sample simulations



What happens if we up the number of mini-sims (samples) 

As mentioned, I upped the internal sim number. The goal in my mind at the time, something that I did not achieve, was to get more consistency and a smoother line.  What I got was a different answer.  This (figure 2) is after 53000 iterations and over 1M sim life years (still pretty small but that took > 24 hours of processing). The lower green benchmarks from figure 1 above in green are here along with a new one (Merton tuned to higher risk aversion and longer life, using years from SOA annuitant 95th percentile -- the highest grey line). The red dashed line is the R predictive regression of the machine's output.  The dots are, of course, the output.
Figure 2. Upping the mini-sim iterations

Huh, weird.  Not converging. I thought at 10,000 iterations that maybe more iterations would train it up and it did a bit but not much. Here is the difference between early and late in figure 3. the lower grey dashed line is the predictive regression at 10k, the red dashed at 53k:

Figure 3. Early vs late learning

But after about 30,000 it stopped learning much[1].  Did I mention I was paying for AWS hours? I stopped the training and gathered I needed to understand the divergence (at this point anyway) from the benchmarks. 

Creating a new Benchmark

The only other benchmark I had not attempted was to use, since the machine uses something similar, a life consumption utility simulator directly. So, in this step I used my WDT tool to estimate expected discounted utility of life consumption at each age for the $1M wealth level. I tried to tune it to similar assumptions such as no social security, similar return and vol, similar conditional mortality (though here it's using SOA data not Gompertz), etc.  I could have tested asset allocations but here I just pegged AA to a 50/50 portfolio which is similar to the embedded assumptions above.  This begs the question of why bother to run a sloppy slow machine when one can access the insight directly, accurately and faster. That's a really really good question that I'll hit in another post. 

When I assemble the new benchmark and plot it (blue) on top of figure 3, we get figure 4. 

Figure 4. A new benchmark



So, the machine seems to be finding what it's supposed to find. I guess that doesn't surprise me too much since the mini sim is the same basic consumption utility approach that is used to construct the benchmark.  The only magic is that the machine got there without anyone telling it to do so up front and without too much labor. 


Am I able to interpret this?

Perhaps.  This is what I think goes on in there.  For the lower mini-sim (smaller sample size, say 100) the chance of having all 100 iterations have no wealth depletion state is moderate and probably happens often enough to matter.  When it does, the machine interprets that as a consumption utility "win" and remembers the spend rate associated with it in its policy. 

When I up the sims, that scenario becomes less likely though not impossible.  It's a little like setting expectations about red on a roulette wheel by looking at the last 3 turns vs the last 12.  Might be more cautious on the latter since you're likely to see more black in that history. 

So, my hypothesis (and this might be a stretch) is that with lower sample sizes the machine finds a solution nearer to financial theory where portfolios and portfolio longevity with or without the mortality contingency is the main game. I'll call that "finance theory" which appears more optimistic. 

For the larger sample size, on the other hand, the hard convexity of the consequences for wealth depletion states comes into play more often and lower spend rates tend to be remembered for machine policy. I'll call that "economic theory" which appears more pessimistic. Either way, the machine is finding something that seems reasonable to me depending on how we frame it. 

"Why bother" with all this is a whole other question coming up in a bit. 

-------------------------------------
[1] - it's possible that with billions of sim-years that the machine will train itself higher given what I describe as happening inside the machine.







No comments:

Post a Comment