Jan 18, 2020

A peek into the learning process of my machine

See the original post that I started with here:

  - An early look (too early) into my amateur mini-machine-learning project


With a revised (minor changes) schematic like this

Intro

When I first embarked on this project I had more or less one goal: get a slice of code to teach itself something. That, I think I’ve done. Then, after that I wanted to get it to at least move towards a smooth line in the way I wanted to present it (i.e., like the benchmarks) rather than an ugly choppy line.  For a bunch of reasons, I think that will be harder than I thought...or impossible, for example:

- Not enough training of the machine due to processing and time limitations
- The spend optima by age are in “a zone,” the machine may not care where it is exactly
- Corner cutting in the model/machine design
- Naivete 

The machine is far from good still, if it ever will be anything but a rough blunt idea, but I’ve made a few mods to see if I can nudge it a bit.  Some idea of changes so far:

- I made the "action" (the spend diffusion) more randomly continuous than discrete
- Tied the “policy” to the full force of the history of “advantage” traced over all generations
- Some minor tweaks and bug fixes and probably something I'm missing

So, the changes here since the last time are not really all that material for a general reader.  But just for fun, since I committed 10 hours to running it again, here are some observations on the changes made and what I think is or isn’t happening inside the machine: 

1. It’s still choppy and not converging as much as I'd prefer so, barring some super new insight, I’ll still lean on the reasons I listed above for the choppiness, and 

2. The machine, kinda like a person, seems to be quite influenced at early adulthood (the end of the iterations I ran which are young by AI training standards) by what it learned as a child (first or early iterations). Great metaphor! As a parent, I know that how a kid is nurtured and raised from 0-5 has a profound impact on their early adulthood, which is why I stayed as present as possible when I was a parent.  But this also means that the machine output, painfully won over long training hours, feels conditional on the inner workings of a black box that does not reveal too many secrets. TBD.   

Given that, I wanted to try to peer into the inside of the machine to see what’s going on. I mean I wrote the code so I kinda know but I also kinda don’t. Let’s take a look. 

Assumptions:

The main assumptions are as in the original linked post; you'll have to go look but basically:
- stochastic, volatile returns 4/12
- random, probability-weighted lifetime using an artificial longevity model
- 1M endowment and 4k real spend at start by a 60yo
- very low risk aversion
- subsistence-level non-SS income available after a wealth-fail

The run time this time was, in batches, about an elapsed 12 hours and 16000 iterations so ~300-400k sim years.  Not sufficient but I got bored and the run time of the batches per iteration is not linear -- due to my hamfisted amateur memory management -- so I shut it down. Oh well.  So things are still not lining up the way I’d prefer but I don’t want to corrupt the machine with my preferences, do I? But I do want to look.  

1. Policy vs Benchmarks

Here is the current spend policy after an initial training session up against the policy benchmarks we established in the first post, plus a couple more (upper 2) not articulated or defined here (think less conservative longevity).  

Figure 1 Provisional Spend Policy after partial training
Some things to note here:

 - this was a very low risk-aversion setup
 - machine model uses moderate longevity assumptions 
 - the endowment value in the chart is constant 
 - benchmarks have mis-matched parameters but are mostly playing same game

Some observations:

 - not very different from prior posts
 - same curve as last post and more or less the same as the benchmarks
 - still choppy but may be for reasons delineated above
 - the early years and the critical point at ~65 look suspicious to my eye
 - the curve looks generally consistent with the theory, something it was not told to do

2. The Spread of What the Machine Found by Age

This chart shows each spend rate that the machine found "advantaged" in any given iteration/sim-life-year ...advantaged over what it was going to do. So at each age for $1M the dots are the distribution of spend rates the machine discovered over a bazillion sim-life-years as having higher expected discounted utility of lifetime consumption. This is only for the 1M wealth level which is constant across all ages in the chart. It is rendered as a heat map so you can see the center concentration. I suppose I could do this as a surface but probably won't.   
Notes

 - at each age this is a scatter of the spend rates that the machine found "advantaged."
 - the blue line is the mean of the scatter at each age
 - the black line is the Merton Optimum with ra=1 and years set to conditional years using SOA data
 - The divergence of blue and black at early ages bugs me but I don't have a theory yet. 

This is an interesting look into the machine because it tells me that there is probably no "real" answer at each age (for $1M) but rather a distribution of almost-right answers. Fundamentally it is not known ...unless one believes in central tendencies. But there is, evidently, at least "a zone." That's for another post but the intuition would not be totally out of synch with human intuition, academics be damned ;-)

The heat scatter is interesting enough but it is every spend ever found in the entire history of the training of the machine.  What that obscures is the evolution of what the machine finds to be interesting, the stuff at which it was not specifically told to look, over the full training interval. Here is another type of peek into the machine.

3. The Difference in What the Machine Knows Early Train vs Late Train

In this chart I took all of the spend rates traced as having had an advantage over all 16000 iterations and 340000 sim-life-years (about 300000 of those "advantages" for all wealth levels), which we charted in 2 above. Then I cut out the first 500 (adv 1:500 in blue) and the last 500 (adv 299501:300000 in red).  This is still for the $1M wealth level.  


I'm hoping the view is obvious. If not, I'll point out some things I see:

- Data are a little thin using 500 but it'll do,

- The machine, especially in the later ages, seems to want to spend more where in the early iterations it was less, closer to where the meta-simulation started, more towards the 4% in the set-up. So, that looks like a little machine-learning, right?

- The red line is ever so slightly smoother and more convex than blue one, which is pretty jumpy.  

That separation between blue and red and the "curving" is to my eye the "learning," so I'm still thinking I haven't lost my mind yet and I have a machine that is actually doing the work.  These two points above give me hope that with some adjustments and more training, the machine might move towards more consistency, maybe even towards the benchmark. No idea, though. 



No comments:

Post a Comment