Jan 9, 2020

Goosing the reinforcement element in my machine-learning toy

The first instantiation of  my machine learning toy had suppressed the reinforcement aspect since I was chicken to do it with a "policy" that was empty of anything-learned at the beginning.  But that meant that it was learning more or less anew each time with some weighting going on that was a little like reinforcement-lite.  This go-round I made it direct where the spend policy for age+wealth is based on the optimal policy so far.  In theory this "reinforces" and should move us towards a more stable solution where my last one was a little jumpy. 

This is the revised schematic: 



I ran about 13 more hours or around 28000 iterations which is, over the intverval of interest, a bit over half million sim years.  That, I hear, is a little thin. Also, I did this with the policy data from the last effort in place so that I was starting with a base that probably has some idiosyncrasies and/or errors.  Next up will be an effort to do it with a zero-base policy and a longer mini sim which will destroy my cpu.  Inside the meta sim we do at least 2 mini sims of forward life. I had it set to 100 iterations which, if you have ever done this, you know is ridiculously small.  It's faster though. The price for that speed is in the consistency, which is bad so far in both efforts.  Might try it on a different platform at some point.

I'm still running cycles but this is what I have so far, the grey lines are explained in the prior post. Red is the RL output, Red dashed is a 2nd order poly trend in Excel. X axis is years because I forgot to label it with ages; assume year 1 is age 60: 

(at $1M level only)

There is not much to conclude from this, relative to the last effort. The most I can say is:

1. The choppiness is down and is getting closer to a line or curve.  But I am working in rounded .005 spend chunks so choppiniess will always be present unless I get more granular, which I won't: a) would be too much work and too slow, and b) real spending variance in real life means recommendations for spending calculated down to 1 or 2 decimals like 4.21 are BS.  I'll stick with chunks.

2.  The line is rising relative to the last effort. It is certainly rising to the top of the reference range.  If I were to be (convenient for me, eh?) skeptical of the output at 60 and 80, and I am, then the red line looks like it wants to hew to the Merton Optimum. This, if, true, would be interesting. Not much I can say there yet.  Failing that, it also is pretty close -- if we leave 60 and 80 in play -- to the PMT method. One way or another, we are confirming existing knowledge which is boring. Maybe I was hoping the machine would tell me I could buy a Lambo this year without guilt or fear ;-)  This whole project has been a lot of work, by the way, for something I can do in my head -- with a similar level of accuracy and, no doubt, economic utility -- with a rule of thumb.

3. There is less convexity.  That could be due to low number of runs or errors or modeling or whatever.  I have no idea if that observation is real or meaningful.. TBD

4. I keep forgetting that this is for very low risk aversion.  Gotta remind myself to try it with something a little more timid.




No comments:

Post a Comment