Mar 19, 2019

On Outliers or a response that got away from me

I received a decent comment from a reader to which I started to reply. But (1) the reply got away from me, and (2) I can't really answer it very well. This is my response and, if anyone is interested, a challenge to the world to help me articulate a better answer. This was in response to what I considered a casual exploratory post
Another half-a** look at trend following and retirement portfolios.  The "half-a**" was the tell on my casualness. 

Here is the comment:
"I guess we can confirm that there is a fat tail, ..." 
Is it really a fat tail, or is it a normal distribution with outliers? Would a Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, or Anderson–Darling test run on your raw data help you infer an answer?  
Also, you're binning data. That could the assailability of your "fat tail" conclusion, couldn't it? 

Here is my tentative response, tentative because it should properly be the start of a dialogue rather than an answer. I mean, really, how much do I really know about this kind of thing...yet.
----------------------------

Hi. I’m not 100% convinced I’m well equipped to answer the comment.  All three of my readers know that I have no formal background in stats or probability theory unless we consider 1 bad MBA stats class in 1988 a “formal background.” I’ve also mentioned here before that I am recording a learning journey (starting more or less from scratch) rather than teaching anything of serious use to anyone...without a reader knowing that disclaimer first. I realize that that kind of response could be considered an un-attractive form of sandbagging but I guess the least I can say is that the question would probably be part of a good learning dialogue so I take the comment seriously and welcome it.
 
I have to also note that I feel like I am (lazily?) following a really large herd of people in a lot of domains, that range from naïve to sophisticated, that tend to call market return data not-normal and fat tailed. I haven’t spent much time (yet) thinking about challenging that language.  Maybe others have. In the end, as a retiree and an amateur, my thought is that outcomes and consequences in retirement have a hard time distinguishing between “causes with improperly labeled fat-tails” and “causes with a better-tuned description of a statistical process.”  Here, quick, try this thought experiment: in some year (or years, if we are thinking about sequence risk) a black swan of returns happens to meet a black swan of spending (with asset liquidation to fund the consumption) in which case we have, to a certain degree*, now permanently corrupted in the present our capacity to consume in the future.  Was that corruption due to an outlier or a fat tail or something that a Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, or Anderson–Darling test could test? Doesn’t matter. Still kinda screwed.  If it’s not hedge-able, that is.

Speaking of spending… Spending is an example of where I think you are quite correct. In a past look at spending distributions that are asymmetrical to the right (like mine…something I called “fat tailed” at the time) we can now look at this through your lens.  And, in fact, this lens (call it “outliers”) was the conclusion I also came to back then.  Spending is not really “fat tailed” as such. What it is is something that is made up of two things: (a) a normally and narrowly distributed and reasonably predictable process that can be described** by modern statistical “stuff,” and (b) another thing that is either: i.) some kind of “outlier” thing, or ii.) some other unrelated non-typical-normal-statistical process that can’t be modeled or, if modeled, we are in Mandelbrotian world, or iii.) it’s just funky curveballs from the universe that we call outliers and then attach to a statistical “process” for our convenience…and then ignore.  As you say, maybe we can then test the “object,” however it’s defined, with sophisticated stats techniques.  Or maybe not. I think this kind of thinking and questioning is useful and as applicable to market returns as it is to spending. I’m sure there is lit out there on this kind of thing, I just haven’t seen it yet. Granted I’m not looking very hard.   

Maybe one other thing to contemplate in all of this is to also consider the evaluation of the consequences of things, to which statistics is typically blind, or at least deaf.  Think about it in trading terms or gambling with uncertainty. One has not only to deal with edge (that can be more or less subjected to statistics) but also odds (that can sometimes be a little more subjective). For example, in trading, I can be objectively wrong in a statistically predictable way 70% of the time but still make money because my payoffs and risk-management are good (and this can be reversed, btw).  Risk management is the big game here in retirement rather than stats because consequences matter quite a bit.  This, for me, is what makes a hard distinction between fat tails and outliers fade a bit. I gotta eat tomorrow, too.

But let me be clear. I very much agree with the push in the comment and take it as constructive. A casual reader should be aware of the issues implied here.  And the “binning” thing, when one has only discrete data, is a really good thing to contemplate. I also agree that I’d like to know more formal mathematics so that, at the very least, I do not embarrass myself too much in the future. On the other hand, at 60, I’m not sure how much motivation I still have for advanced prob/stats/calc etc. I had a ton of motivation for the last four years but it’s starting to fade.  But maybe this is an opportunity to up my game.

So, let me make a counter offer:  (a) you, or anyone, make a digestible “connect” between awareness and manipulation of “Shapiro–Wilk, Kolmogorov–Smirnov…” and “retirement outcomes and consequences” and I’ll guest-post it, or (b) show me how to do it and I’ll post it if I can figure it out, or (c) link me to the lit that an amateur can read and absorb and I might try.  Maybe.


* ignores lifetime income and human capital among other things.
** assuming inter-temporal stability and no state-jumps

7 comments:

  1. Wow! I don't know whether your reply "got out of hand," is an outlier, or is part of your own platykurtic-normal distribution of wandering thoughts in retirement ;-)
    [a pattern I, also a retiree, very much share].

    I've pretty much spent my "reply" time today, but a couple of things struck me in your comments. First, yes: we've all got to eat tomorrow ("Risk management is the big game here in retirement rather than stats because consequences matter quite a bit."). In that regard, and in conjunction with your comment that, "Maybe one other thing to contemplate in all of this is to also consider the evaluation of the consequences of things, to which statistics is typically blind, or at least deaf," I think you may be conflating a few things that you might consider separating.

    First, the black swan concept as you seem to be describing it is an event occurring in the context of an *individual's* portfolio. There as a lot of research going on trying to apply prediction to the ends of distributions (prediction of rare events), but I think it's good to remember that statistics deals with inference from groups, and generally is used to make conclusions about sub-groups, or like populations. When we get down to making predictions about individuals, context and specificity are more important than false positive or negative rates. It's tough, or even impossible, to predict the impact of a macroeconomic event on my personal microeconomics. That's not the fault of statistics: it's a matter of not using the correct tools for the job (I don't know of any single tool that works for *every* individual's portfolio in the "black-swan/individual-portfolio" case; maybe one exists? I'm not an economist.).

    [More in next comment: I've exceeded the character limit]

    Best regards,

    Francis

    ReplyDelete
  2. [continuing]

    Second, I don't think statistics is blind or deaf. People, yes; statistics, not so much. The purpose of statistical inference is to assist in making of better decisions. As the saying goes, "The alternative to good statistics is not 'no statistics,' it’s 'bad statistics'." (Bill James). The purpose of good statistics is to help clarify, and in that regard it's either the analyst or the statistics interpreter who may be deaf or blind. That goes along with the tongue-in-check comment by Ernest Rutherford that, "If your experiment needs statistics, you ought to have done a better experiment."

    So, on to your counter-offer. I am busy being retired, and the topic of market return is not so fascinating to me that I am willing to take on the task of creating a digestible connect between awareness and the various formulae for testing normality.

    As to a way to link you up, you might try this: https://blog.jamovi.org/2018/10/25/learning-statistics-with-jamovi.html That link takes you to a site where you can download a free 500+ book providing a tutorial for a wonderful statistics package that also is free. A foray into the book and program might work to provide you with the resource you're asking for in terms of testing normality. And, if you are interested in examining binning, you might try this resource (also free): https://www.blueskystatistics.co.uk/comparing-popular-procedures-r-bluesky-statistics/ As to binning's quantitative effects, you can take a look at this (https://medium.com/@peterflom/why-binning-continuous-data-is-almost-always-a-mistake-ad0b3a1d141f), but there is a load of stuff on binning that Prof. Google could refer you to, as well.

    Regarding the motivation to dig in to the whole data analytics field, who knows? I find the topic fascinating enough to study it in retirement (with some reservations, I recommend the free course, "The Analytics Edge," an MIT course offered on edx.org). I guess my thought is, you're clearly good at math. You're also developing and testing your theories (or hypotheses, if "theory" is too grand a word). As long as those things are true of you, it seems to me that Andrejs Dunkels' observation applies: ""It is easy to lie with statistics. It is hard to tell the truth without statistics."

    Best regards,

    Francis

    ReplyDelete
    Replies
    1. Cool. Useful for readers. I'm on a minor hiatus to work on phys conditioning and relationships vs. math for a bit so I'll circle back on this after that go-round. I have the capacity for it and interest, just not 24/7 like last year. My statistics comments are envisioned like this: Gaming like Roulette or Craps is statistics. Poker is part stats, part other. Retirement is part stats, part other. Spending is part stats part other. Market returns, if everything stays more or less stable, is part stats part other. The "other" will never be 100% domesticated mathematically and that interests me more than market returns as you say above for yourself (esp not 100% in the market else it'd be too easy to arbitrage) . There is some good stuff out there on "other." Taleb comes to mind but he is not the only or earliest. I like McGoun '95 on this (The History of Risk “Measurement,” Critical Perspectives in Accounting 6, 511-532 Academic Press Limited). There are a ton of others. re black swan comment, a lot of ret-fin seems to make conclusions from fake groups and artificial worlds (e.g., via simulation and 20k paths) and forget we personally have only one "whack at the cat" (a phrase from M Zwecher's book) which others sometimes correctly and sometimes incorrectly refer to path dependency so I'm with you. But I'll also say the whole point of Black swans, in retirement, in my opinion, is that they are deeply connected to individual experience, financially and psychologically and behaviorally. A body blow is a body blow and is to be avoided if possible.

      The counter-offer was mostly tongue-in-cheek to you but in reality it was offered to the world of my three readers and beyond. A large percent of my reading list over 4 years has come from others with whom I have engaged. I'll call out people like Dave Cantor, Corey Hoffstein, Dirk Cotton, Moshe Milevsky who helped me solve a problem in person, etc. I'll stagnate for a while and then someone randomly shows up to prime a new pump. I offer challenges not to be a churl but to re-vivify my reading list and my interest in new things. Plus, while statistics may not be blind (I'll hang on to deaf though) *I* certainly have blind spots and have fallen on my face publicly here more than a few times. Regards. WS

      Delete
  3. Phys conditioning and relationships rank higher than math for me, too. As they should.

    I enjoyed your reply. Your reading list relates a bit to my interests, and when I have some spare time I'll browse around the authors you mention.

    As to your envisioning of stats (“Gaming like Roulette or Craps is statistics. Poker is part stats, part other. Retirement is part stats, part other”). I see those games as probability rather than statistics, but for our purposes that’s a semantic quibble. My description for stats, FWIW, would be outcome, predictors and error term. That would be akin to whether someone chooses Duracell or Kirkland brand batteries at Costco based on price, location in the aisle, impatience level, presence or absence of a free-food person in the area, size of display and unknowns.

    I like the notion of "one whack at the cat," although as a person with two cats in-house and one outdoors, I need to find a different species to whack, metaphorically. My choice, metaphorically, is the fox I live-trapped and let go out by the creek this morning. My goal was for him/her to live out his/her life in a different, but pleasant, neighborhood, rather than living the laggard’s life of eating our outdoor cat’s kibble. Instead, s/he was awfully aggressive toward me while trapped, despite the fact that I was just trying to do us both a good turn. You can take that as a metaphor for what the market does, too, I suppose.

    I absolutely agree with you that, "Black swans, in retirement . . . are deeply connected to individual experience, financially and psychologically and behaviorally,” and to be avoided if possible. I'd conceptualize this by putting the emphasis on "psychologically and behaviorally" as independent variables, and setting “financially” as the dependent variable, with an error term in the linear model. That brings us back to trying to predict how macroeconomic events affect individuals. Again, I’m not an economist. I’m putting concepts into a framework that works for me.

    You’ve called out a list of folks who mostly are unknown to me. If they’re all of the caliber of Dirk Cotton, however, then you’ve had great success in finding helpful problem solvers.

    All these words having been written, it still would be fun to have someone look at your market return data and tell us whether a normal distribution fits the data. I know from his blog that Dirk Cotton has great skills with R. I’d love to have him take a metaphorical whack (species of his choice) at the question.

    Best regards,

    Francis

    ReplyDelete
  4. Most commentators and quants seem to avoid the normal distribution assumption. I follow that out of habit. Fwiw, I just read this yesterday and find it to be wise https://blog.thinknewfound.com/2019/02/no-pain-no-premium/ as well as something that briefly addresses the non-normal thing. The finance lit is replete with more examples of this kind of thing but I can't conjure them here.

    I have two cats and do not actually whack them. You are ahead of me on stats and probability. I have a minor, temporary, and loosely-guarded edge on Ret-fin compared to some (amateur) cohorts. My intuition is now tapping me on the shoulder and saying "hey...William, a potentially productive correspondence in the future here." So, if you want to email my blog email I can pass on my real one.

    Note I had two reading lists, one on literature (https://rivershedge.blogspot.com/2018/04/a-life-in-books.html) and one Ret fin (https://rivershedge.blogspot.com/p/abbas-matheson-normative-target-based.html).

    WS

    ReplyDelete
  5. William, I'm sorry, I can't figure out how to follow up on this: "My intuition is now tapping me on the shoulder and saying "hey...William, a potentially productive correspondence in the future here." So, *if you want to email my blog email I can pass on my real one.*" [emphases added] However, I clicked on the "notify me" option, so that may solve the problem.

    TIA,

    Francis

    ReplyDelete