Jan 24, 2020

Some Observations on my Machine Learning Project

The original post and all the associated links and references, for context, are here:
  • https://rivershedge.blogspot.com/p/machine-spending.html
A recent question from a correspondent, one that I asked myself in the last post was:
"Why bother to run a sloppy, slow, imprecise machine when one can access the insight directly, accurately and faster by other means?" [paraphrased h/t to David Cantor]
Good question; cuts to the core of my project.  Let's see if I can rationalize what I call "riding a bike in first gear:" a lot of motion and heat for very little forward progress.

What happens when you try to improve the machine by suppressing outliers

Intro

The short answer to the title is that it looks like the machine's output shifts from finance to economics. That confused me at first but I think I have a bead on this. First we'll look at where we've been with: a) lower risk aversion (small, error prone sampling), and b) slightly higher risk aversion (again with smaller sampling. Then I'll change the sampling a bit to see what happens. Then finally I'll try to explain what I think I'm seeing.

What do I mean by sampling and outliers?

In the machine/model as it walks through the meta-sim -- where  "1 iteration = 1 life" and then "year by year within a life" -- it is, at each age for whatever wealth level and spend rate it is at, checking by way of a forward consumption utility simulation for an estimate of the lifetime consumption utility. It does this in order to compare a course of action (changing the spending) to a baseline (what it would have done notwithstanding the change). Since that is a heavy use of the processor and since I was just playing around I originally kept the iterations for that internal mini-sim low, say 100.  That is "the sample" and since it it is technically sampling from infinity, it is a laughably low sample.  In this post I increased that to 300 which is still laughably low but also painfully slow. On AWS with 4x4core so 16 CPUs it takes about 50 minutes for 1000 iterations of the meta-sim. I later nudged it down to 200 due to impatience but that didn't change the conclusions much. 

The main difference, an obvious statistical thing, is that the dispersion of the sampling distribution narrows a bit and the relative impact of outliers (of lifetime consumption utility) comes in. I'll try to interpret that later.

Jan 21, 2020

Machine v Merton at RA=2

see original post for set-up and assumptions


  • Blue is learning machine at around 300000 sim years, risk aversion coeff set to 2
  • Grey is my RH40 rule of thumb
  • Orange is the Merton optimum with RA=2 and years tuned to SOA annuitant 90th percentile


Choppy output but still looks like it's getting it done one way or another...

...although in runs after this, I'm noticing that the bend up at later ages here may be more pronounced than it is in my future runs because the mini sim is prone to sampling errors and the high value/utility "errors" are more likely to be captured as an advantage. In adding more cycles, which makes it really really slow it looks like it might not curve up as much as this chart.  On the other hand I've noticed that late iterations are pretty strongly related to what unfolds in the first few so maybe that's part of the problem. No idea yet. TBD



Jan 20, 2020

Digging a bit more into the Machine-derived chart for higher risk aversion

Start here for background:

------------

I ran a few more cycles (up to 16000 now so ~320000 sim years) of my machine where the Risk Aversion Coefficient (RA) was now set to "2". So at this point in my iterations, I thought I'd hazard some opinions on what I think the machine is doing at this RA level. Here is the current chart, as of my most recent run, of the spending policy at different ages for $1M in starting endowment at each of those ages:


Jan 19, 2020

My machine digesting some higher risk aversion

Start here:

I've been playing with this thing just for fun to see where it goes. The first versions were too discrete in it's approach to spending exploration and therefore a little unstable.  Also I had only designed it for log utility (i.e., low or RA=1 in CRRA math). This increment of coding added RA > 1 where the formula is [C^(1-ra)-1]/(1-ra) if I recall correctly.

For this really fast, too-short, too-few-iterations run I flipped RA to 2. Not much of a change but: a) going up a bit in RA has convex and significant effects, b) in my own work a RA=2 is about how I behave based on what I see, and c) my opinion is that really high RA needs fewer models and more counselling.  Think of it this way. If I wear a seat belt I am prudently risk averse. If I am an agoraphobic and never leave the house, I am risk averse and I need help.  I have an unfounded opinion that over about RA=3 it starts to get a little odd, but that's just me.

Jan 18, 2020

A peek into the learning process of my machine

See the original post that I started with here:

  - An early look (too early) into my amateur mini-machine-learning project


With a revised (minor changes) schematic like this

Intro

When I first embarked on this project I had more or less one goal: get a slice of code to teach itself something. That, I think I’ve done. Then, after that I wanted to get it to at least move towards a smooth line in the way I wanted to present it (i.e., like the benchmarks) rather than an ugly choppy line.  For a bunch of reasons, I think that will be harder than I thought...or impossible, for example:

Jan 14, 2020

Trying to Increase the learning speed of my naive RL machine

In the last go round starting with this post, along with some enhancements related to goosing the reinforcement aspect, my machine was slow and the output was choppy and unstable.  Partly this is due to the rough, amateur nature of the experiment. I have an agent using fuzzy action in quite discrete chunks with small internal dynamic mini-simulation.

It dawned on me, though, that the mini sims are effectively a sampling-from-infinity process and my small sample size causes problems when evaluating advantage/reward.  Effectively the machine remembers too much about optimal or advantaged spend outliers especially on one side of a tail.

Jan 11, 2020

Update on my Reinforcement Learning Experiment

I've now run my reinforcement learning experiment through close to 40 hours of 2 rounds of training and maybe around 1.1 million sim-years. That's evidently thin for training these kinds of things but maybe enough for me to evaluate where I am.

I've kept my data in generations for restart-recovery purposes but that also allows me a window into evolution of what it is finding.  And, pretty much, what it is finding is a choppy result that isn't changing to much any more but is still imprecise or inconsistent in it's policy recommendations by age.  That inconsistency I wanted to think about today.

Jan 9, 2020

Goosing the reinforcement element in my machine-learning toy

The first instantiation of  my machine learning toy had suppressed the reinforcement aspect since I was chicken to do it with a "policy" that was empty of anything-learned at the beginning.  But that meant that it was learning more or less anew each time with some weighting going on that was a little like reinforcement-lite.  This go-round I made it direct where the spend policy for age+wealth is based on the optimal policy so far.  In theory this "reinforces" and should move us towards a more stable solution where my last one was a little jumpy. 

This is the revised schematic: 


Jan 7, 2020

An interesting side effect in my machine-learning project

I was doing some follow-up on my mini-machine learning project.  I put out some caveats in that post so I won't repeat them here. Basically the project was sketchy enough and premature enough that I'd advise taking a grain of salt or three here.

In this look, I had noticed in the data that at higher levels of wealth at some time t, the machine liked lower spend rates (we'd only looked at W(t)=$1M by the way).  That was counter-intuitive to me since I feel like I'm personally closer to "the edge" than I'd prefer and I feel like if I had more $ I'd perhaps loosen up a bit in both absolute and relative terms. But the machine is the machine and we obey the machine in our dystopian sci-fi ret-fin world. Let's look at what he/she is telling us.

Jan 6, 2020

An early look (too early) into my amateur mini-machine-learning project

This content has been put into a page (Machine Spending) at the top of the blog and will be maintained there...

-----

Interview with RiversHedge

INT: Is this thing you just did really reinforcement learning?

RH: Um, no. Maybe?  Idk. It's code that "does stuff." I tried to make something that evaluated what I call fuzzed-out actions based on a value function and then adjusted a policy over many training iterations. It's probably not "real" reinforcement learning...yet. Might be a more general "machine learning" thing. Either way I hope it's an amateur hobbyist baby step in the right direction.