Feb 8, 2018

Deconstructing one non-normal SPY distribution into two normals -- just for fun -- and then putting it back together again

I thought I'd take the distribution mix concept I've been working on out for a test drive.  I realize this is kindergarten level work for a quant but it's a new tool to me and I'm trying to get the hang of it.  To do that I took the full yahoo history of SPY returns and used expectation maximization to deconstruct it into two normal distributions just to see what came up and what it looked like.  I then used the two normal distributions to turn around and then reconstruct a fake fat tailed skewed distribution like SPY. The idea is not necessarily that I want to model SPY -- but I might -- it's that that means I could model a pretty broad range of other things that do and do not exist.  Which might be fun.

Running the algorithm (not my excel or R model this is an R function called normalmixEM()) came up with SPY being ~83% a normal distribution [EM dist1] that has mu=.0166 and sd=.0333 and ~17% a normal distribution [EM dist2] that has mu=-.0302 and sd=.0525. The original SPY was mu=.0085 and sd=.0414   These are monthly series, by the way.  When you reconstruct it you get a gaussian mix with mu=.0072 and sd=.0446. I didn't measure the other moments yet because there is something wanky with my understanding of the two competing R functions I was using to do that and I didn't want to get it wrong.  On the other hand visually it works out nicely....like this:


  - black is the original SPY density for monthly returns
  - blue is the normally distributed "EM dist 1" (high) - random return generation
  - red is the normally distributed "EM dist 2" (low) - random return generation
  - black dotted is the artificially/mathematically reconstructed non-normal Gaussian mix

Works pretty well (smallish data series so not perfect).   I thought that was pretty slick.  In fact after I posted this I was thinking about it a little more.  I am, for better or worse, an amateur or perhaps a tourist visiting the land of retirement finance and probability theory. I have my camera and my ugly tourist shorts but that's about it; I don't speak the language and I don't live there. So, for me, while it is one thing to know some basic stats like the various moments of a distribution or how to generate a CDF or how to integrate a PDF, its another thing altogether to look at a distribution and see multiple other distributions hidden inside trying to get out.  I'll probably never look at a data distribution in quite the same way again. I'll call this whole exercise a "net add" to my trip.






No comments:

Post a Comment