Oct 23, 2016

Fail Rate Estimates vs. Number of Simulation Runs





I realize that when running Monte Carlo simulations for retirement finance that elementary statistics, things like the law of large numbers and the central limit theorem for example, will nudge me towards using larger numbers of runs -- each set of runs is really just a type of sampling of an infinite population of possibilities for retirement outcomes -- but I wanted to see for myself what it looked like doing fewer or more sim runs and how that affects my fail rate estimates.  I did this for a couple reasons:

- I have such an inefficient platform that it takes a pretty long time to run 10,000 simulations.  It's close to 20 minutes which says way more about my programming skills and choice of software than anything else. On the other hand this is a good example of the cost of simulation.  More runs cost more in time and effort and fewer runs risk coming up with fuzzy or imprecise or mis-understandable results. In my case I get aggravated by waiting for a long run when I can do other things instead. I wanted to see what I lose when I dial back the number of simulations.

- I just wanted to do it for the hell of it and see what it looked like for my model.  It's been on my to do list to check it out.

- I wanted to get a sense of whether there was any kind of realistic practical difference between, say, 1,000 runs and 10,000 runs.  Monte Carlo Simulation is very, very easy to mis-understand if the model and the assumptions and the technique are not well known.  There is a strong case, except for research purposes or when using simulation in proper context with other methods, that fail rates are less than useful an awful lot of the time.  I wanted to gauge for myself the possible balance or imbalance between the conceptual flaws of fail rate simulating and the precision or imprecision of more or fewer runs.  Basically it's a "should I really care about any of this" kind of question.

- The statistical "population" from which I am sampling is more or less infinite if we say that the number of runs, each of which represent different possible states of nature and possible outcomes, can be infinite (or at least infinite within the fake world created by the model). So I wanted to play this out because I still have a hard time wrapping my head around this.

- I have kind of a quirky model and while it can't escape statistical gravity I wanted to see how it shook out when using different scales of sim runs.  Again, just for "fun."

So I did this then: I took some basic assumptions for a retiree using: a) some that looked like me, like age, and b) I made up others. I just wanted some reasonable assumptions that could maybe look like a real person (note that in this post the underlying assumptions really don't matter much, we're just looking at the effects of different run scales).  I then ran the simulator 10 times each with 500 runs, 10 times with 1,000 runs, 10 with 2,000, 10 with 5,000, and 10 with 10,000.  10 is not much of anything when doing a sampling distribution but I knew I'd be aggravated on the last round when doing the 10,000 runs.  10 x 20 minutes each is more than three hours in total.  I have laundry and dishes and kids and stuff. Doing 20 or 30 samples would have been a big problem.

Each of the 10 runs generated an average fail rate equal the number of times the plan ran out of money (no metric for the magnitude of the failure like the number of years failed, etc) divided by the number of runs.  I then did the stats on the sampling distribution i.e., the average over the 10 runs at each level.  I'm not totally sure I have the statistics right but maybe someone can correct me, without flaming too much, if they see flaws.  I'm not a prof or a pro.  

Here were the fail rates at each level:

1 2 3 4 5 6 7 8 9 10
500 4.8 5.6 3.0 4.2 4 4.4 3.8 4.6 4.4 3.4
1000 4.1 4.6 3.2 4.3 4.1 5.2 4.7 4.0 4.7 4.1
2000 4.3 3.9 4.1 5.2 4.6 4.5 4.2 4.3 4.5 4.5
5000 4.3 4.1 3.8 4.7 4.1 4.3 4.3 4.4 4.2 4.0
10000 4.2 4.1 4.5 4.5 4.2 4.2 4.1 4.3 4.1 4.0


Conclusions?

- This all matches intuition and expectations based on basic statistical math.  This is stats 101. I already knew what the chart would look like before I did it.

- Run more sims if I can deal with it, I guess. Or alternatively figure out the confidence intervals for some level of runs and use it.  Or maybe sample more with something 500 or 1000 runs x 10 or 20 samples and then use the average.  The averages look pretty consistent here.  But then there is the third point...

- Does it matter? Probably not. The difference in precision is, I think, totally dominated by whether or not one is in a good position to understand what simulation and fail rates mean in the first place and how they should or shouldn't spur action and what kind of action should be taken.  See Moshe Milevskey on simulation fail rate abuse.





  




No comments:

Post a Comment