The definition of insanity is doing something over and over again and expecting a different result.
Variations of this captivating expression have been around since the early 1980s (though misattributed a generation earlier to the late physicist Albert Einstein). Those underlying ideas however have roots etched into significant religions, such as the ideology of Taoism.
The world is more complicated however, and natural probability models can properly clash with these well-worn expressions similar to “this time it’s different.” In this article we explore many specific ideas where closed-form probability frameworks can be distinctly leveraged to help solve how to think through risk choices where the assumption of normality breaks down.
Sometimes life events offer explicitly abnormal risks, such as in a continuous model; other times we see experiences that behave similar to an annuity, and other times still we show modern techniques in terms of thinking through lump sum, and random walk models, and the manipulation of multiple normal distributions. The latter two topics can be thought about in terms of life phenomenon having ordered chaos versus having creative chaos. Regardless, here we show that in many situations people overestimate the accumulation of these abnormal risks over their life (while we generally understand people to underestimate the characteristics of any one risk as being normal-tailed when they can many times be fat-tailed).
We’ll discuss many examples in this article, to develop the appropriate probability theory and concepts. The first is from the lethal game of Russian roulette, except here we have a modified version of the game where the player is unaware of the number of rounds (say 0, 1, or 2) that were loaded into the 6-chamber revolver. After a few safe attempts, where the trigger was pulled a few times, can we always assert that the future pulls of the trigger will mimic the past? The few historical observations we would have used in this case would –after all- be consistent with any argument that no bullets would be fired in the future. Instead it should be obvious in this example that it would not be insanity to think that the next pull of the trigger could be different. In real-life we often just witness a few historical observations of an event, lack all of the appropriate context, and still prematurely conclude with extraordinary confidence that it would be insane to think next time is different. Yet instead, the next time could very different (and also arrive with irreversible consequences!)
Since this is a probability focused site, it should be noted that the model for the expected value of Russian roulette would depend on the assumption of whether the chamber is to be spun between attempts of the game (“with repetition” in probability parlance) or not be spun (“without repetition”). The expected number of attempts before a bullet is discharged, assuming a single bullet was loaded into the chamber, is clearly 3.5 (the median value of the number of possible rounds). The 20th century civil rights leader Malcolm X wrote in his biography that he had played this technique once to show his criminal peers that he has ready to die, by pulling the trigger of a pistol against his head a few times.
The “with repetition” framework is similar to a simple transition matrix model that we often see in biological and economic systems. We can see this analysis here for an example with fund managers wanting to remain in a certain quartile performance status. For our simplified two-state Russian roulette, we have a special form of a Pascal probability model (named after the dazzling 17th century French mathematician). When the number of possible “failures” is equal to an integer, then this becomes a negative binomial distribution. And when we only care about the end of the game with one of these failures (a released round to the head), then this probability model further collapses to a geometric distribution.
p(k) = (1-p)k-1p
Where k is the number of attempts, and p is the probability of failure (or 1/6). Now the typical number of attempts in this “with repetition” example, we have this calculus.
E(*) = sumk=0→∞ (1-p)k-1p*k
= p*sumk=0→∞ (1-p)k-1*k
= -p*[sumk=0→∞ d/dp (1-p)k]
= -p*[d/dp sumk=0→∞ (1-p)k]
= -p*[d/dp 1+(1-p)+(1-p)2+(1-p)3+…]
= -p*[d/dp 1+1/(1-(1-p))]
= -p*[d/dp 1+p-1]
Of course the property for this above formulas is memoryless, similar to the property of exponential decay models. In other words, after 6 successful attempts, the expected number of attempts before death is neither suddenly never (expecting this time it won’t be different) nor suddenly imminent. It continuous to remain 6, as though the previous attempts never happened. So for a constant yprior number of successes, there are the following typical number of attempts:
E(*/y) = sumk=0→∞ (1-p)k-1p*k
= sumk=0→y-1 (1-0)k-10*k + sumk=y→∞ (1-p)k-y-1p*k
= 0 + p*sumk=y→∞ (1-p)k-y-1*k
= p*sumk=y→∞ (1-p)k-y-1*(k-y) + p*sumk=y→∞ (1-p)k-y-1*(y)
= -p*[sumk=y→∞ d/dp (1-p)k-y ] + p(1+p)-1y[1+(1-p)+(1-p)2+…]
= -p*[d/dp sumk=y→∞ (1-p)k-y] + p(1+p) -1y[1+1/(1-(1-p))]
= -p*[d/dp 1+(1-p)+(1-p)2+(1-p)3+…] + p(1+p) -1y[1+p-1]
= -p*[d/dp 1+1/(1-(1-p))] + (1+p) -1y[p+1]
= -p*[d/dp 1+p-1] + y
= -p*[-p-2] + y
= p-1 + y
= 6 + y
Next, let’s shift to the risks associated with a growth compounding vehicle; say a lump-sum payment or an annuity-due (payment at the start of a period). Say that we have scenarios where one has $32 at risk, over 32 years of their lifetime. There are a couple different ways that this can occur. The first way is similar to someone having been gifted $32 at the start of one’s career, and then investing those proceeds over their subsequent career of 32 years. In this first way, the ending risk associated with the growth of this payment would be normally distributed with a standard deviation equal σ√t, where σ is equal to the independent, annual standard deviation of the returns, and t is equal to 32 years. The assumption of independent and identically distributed (i.i.d.) is important here since independent implies no covariation (the opposite does not hold as we can see with y=x2.) So now the variance and standard deviation of 2 i.i.d. with standard deviation σ is:
= Covariance(X1,X1) + Covariance(X1,X2) + Covariance (X2,X1) + Covariance(X2,X2)
= Variance (X1) + 2Covariance(X1,X2) + Variance(X2)
= 2Variance (X1) + 2*0
Standard deviation (X1+X2)
= √2*√Variance (X1)
With t number of i.i.d., the covariance matrix above (where t=2) would simply collapse to the generalized version of σ√t since all of the covariance of the difference pairs of variables would equal 0. In these cases, it’s also easy to see how to cut this overall volatility risk in half:
In other words the $32 lump sum could be invested in a growth of half of the annual risk level (clearly the expected returns of this growth would also be cut meaningfully), or the time horizon would need to come down from the 32 years to ¼*32, or just 8 years. Now most people would not be interested in reducing their risk exposure in half, by only investing in the first 8 years of their career, followed by not investing for the remaining 24 years. We are also making the assumption here that one would live for the entire 32 years, a problem that is undoubtedly more complicated as we’ll show, when this assumption is false.
The second way one could have $32 at risk over 32 years is if he or she were accumulating this level steadily, during such time. Say $1 at time 0, $1 at time 1, another $1 at time 2, etc. This is a more realistic view of how one accumulates risk exposure over their careers. In this case there is no simple risk model that can in a closed-form mathematically show what the final risk distribution should be. But we will see and rationalize that this resulting, more practical risk exposure, is not normally distributed. See the illustration below where we generate hundreds of simulations to demonstrate various distributions of outcomes.
In the illustration above (in blue), the outcome distribution for the first lump sum example above using both 10% annual σ and a 40% annual σ. And (in red) we see the outcome distribution for the second example we have here of the annuity-due, again using 10% σ and 40% σ. Let’s review the summary statistics now for all 4 distributions above. The first 2 distributions are normally distributed (so lacking skew and have normal-tails). The fourth distribution (40% annuity) is platykurtic with visibly tighter distribution then the fatter tails associated with the normal distribution (as seen in the 40% lump sum distribution above). The third distribution (10% annuity) also follows the same distribution as the 40% annuity, but interestingly given the lower σ it is exceptionally difficult to distinguish its risk profile versus a normal distribution (e.g., see the overlap with the 10% lump sum distribution).
So the idea of the risk distribution for this second, more realistic, way of being exposed to $32 in risk over one’s career can be modeled as a normal distribution in some annuity cases! Notably when there is a steady flow of risk accumulation that is subject to very low risk. We’ll see that theme throughout this article and it has implications for answering the same question of how to reduce one’s risk, where only in this case can we borrow some of the same risk formulas shown in the lump sum section above (albeit the definition of s would be different).
In both of these cases, we can help understand how the difference in answering the question how much risk to have in one’s portfolio based upon age is going to vary from the simply rule of thumb of investing 100 minus your age in risk, or zero if one if fortunate enough to be a centurion. So if one is 20 years old, this would be 100-20 or 80% of their portfolio in risk. And 32 years later, at age 52, this would be 100-52 or 48% of their portfolio in risk. One need not take these portions of risk and career length too literally (obviously everyone’s personal situation and the market offerings at any given time would influence these choices), but rather think of this framework as a directional impression for how one might be thinking about risk over time depending on the different approaches, especially for any lower σ volatility scenario. One should be able to consider what possible outcome is someone protecting himself or herself from that leads to a noticeably linear risk model, and such a lower risk profile in most the early years (e.g., 2/3 of the initial years one is underinvested, and in 1/3 of the final years one is overinvested)? Incidentally, this later point of being overinvested in the latter several years also applies to most forms of the recently in vogue, target-date or “lifestyle” investing.
And what we discussed here in the concept of what this risk means, because if the growth coefficient of variation (articles on this coefficient here, here, here) is say 2 then the recovery from a “risk” in the later years of one’s career would be made up for relatively quickly (e.g., in about 1/6 of the final years instead of 1/3). This would rotate the figurative blue distribution of risk upwards, as if pivoting the entire blue line curved line from the initial value of 100% in a counter clock-wise fashion such that the blue line currently positioned at (A) comes up to about where (A’) currently is. Mortality, and valuations would both work to onlyadjust this (A’) level slightly.
Increasing or decreasing annuities
Other serious mathematical entanglements ensue in the real world (and we’ll intrepidly tackle them here), where there is somewhat ordered changes in the annuity progression over time. For example, the level of the annuity can increase or decrease in a step-wise function, making modeling from the past observations generally more difficult. The motivations are clear, as one can have an escalating amount of risk during much of their lives, and other than through open-end simulation there is no way to have this modeled. Now one can also have declining balances, where the amount of risk reduces over time. For example, if one has a mortgage that they pay off during their career, or if one has a retirement nest egg that they can draw down on the fixed schedule. Both of these ideas for changing annuity structures, with or without a fixed amount of overall variability, has been self-written about in a top annual publication in the Society of Actuaries (Dollar Cost Averaging Risk, p.17). Equity valuations on individual companies operate with similar unfitting approximations here, as an income stream is brought to present value with a constant discount rate. And then projected into the future against various risk scenarios.
Additionally, as with the case of the traditional annuity-due example, the overall risk profile is not always growing in a smooth convex manner, such as with the lump sum example. The rationale is the amount of risk both grows with time but the balance behind such risk can be marginalized on an annual basis. So these forces can oppose one another and create a maximum value in the middle of the life cycle.
Even here, we revert to the common article themes that in a few closed-form situations we’ll show that we can model low risk relative to absolute return experiences differently from most risk events. And another common lesson is that the amount of risk that one can take on during their careers or lifecycle is generally higher (but not so much as to employ leverage where other risk/return considerations and costs come into play), unless there are truly the abnormal asymmetric risks which are also so severe (e.g., Russian roulette) such that “different” (to be argued later) is something that can not be “naturally” recovered from. This is unlike situations such as financial market or economic shocks, where excess leverage or risk in one’s career is not necessary; instead one needs to stay focused on the correct level of risk to take on conditional with their context of ever happening to be in a high return and low risk period.
Advanced random walk
Now let’s look at modeling the uncertainty associated with a lifetime, interest-compounding stream of income so that we can answer important questions about the level of risk we should be exposed to through our life. To do this, we must consider two critical aspects of this stream of future random events. The first is when repetition generally shows similar results, and the second is when repetition fails and then shows dissimilar results. This is not being insane, but rather a prudent process within risk management. And the results here provide supplemental understanding to our previous discussion of why one should not have an age-based rule for how much of their portfolio should be in risk during their career.
Having a better understanding of what factors would influence a typical decision on thinking about portfolio risk, averaging across all equal-aged people and across all economic cycles, poses difficult questions. Yet this is ultimately more fundamental than the popular media and financial advisor debates hanging off the sides, such as ignoring these probability models and simply advising citizens based upon an inconsistent set of quick rules that confusingly mix in a guess on the current asset valuation levels.
The point of this article is that we should always be thinking differently about this sort of risk modeling framework, given the types of probability ideas that can be expanded upon as we are learning to do here. It is instead insanity to not think something different can occur and not independently think about life choices reflecting unknown unknowns. This includes the insanity of recycling awkward spiritual advice from counselors.
Helping fuel this crazed drive, for thinking about how much risk to take on, is to complete a void where no closed-form theoretical actuary models can assist in looking at two important tweaks to this problem. Such as incorporating a probability of death randomly between when one starts their career, and when one retires. Speaking of Russian roulette odds, most seem to ignore that one has about a 1/5 chance in some tables, for a 20 year olds dying before their 65th birthday. Or having the ability to incorporate a high, real income growth during one’s career, for those where this applies. Of course there are numerous unique considerations as well one needs to think about for their personal risk modeling, such as family structure.
It’s therefore easy, but sloppy, to discard some of these and try open-form modeling to derive what our best decisions to be. The issue here is we too often allow our computer skills to conflate with random creative biases, in order to come up with a risk decision. We’ve seen this in other examples from public policy, to unfortunately even academia. The focus should be instead of thinking through the problems as we do with the Monty Hall problem in Chapter 2 of Statistics Topics.
Some random walk processes that assist in modeling assuming a random interest rate (could be reversed to look at variations in risky asset returns) imply that the logarithmic function applied to the returns is normally distributed [e.g., ln(1+It)-ln(1+It-1)~εt]. Of course this helps with modeling, since the addition or subtraction of two i.i.d. normal distributions is also normally distributed. And as we showed in the initial annuity-due examples, this probability assumption could be repeated numerous times, to reflect the length t of a lifetime process.
Other random walk processes make the assumption that the combination of two normal distributions through multiplication (and we’ll also show this here through division) can also results in an approximately normal distribution (e.g., lnIt-lnIt-1~εt). This is akin to how a continuous return and a discrete arithmetic return are very similar when the return is near zero. The necessities of these other forms of normal distribution combinations help provide greater number of closed-form modeling functions that can be performed to better understand the resulting distribution.
Even though a variation of this unconventional version immediately above is noted at the end of a popular Bowers’ Actuarial Mathematics book, it has limitations that we’ll see. And we typically we should not be able to multiply or divide two normal distributions together and have it result in a normal distribution. If both distributions are centered about zero, then we’ll see here one popular exception applies (for the case of multiplication of course!) One of the easier normalcy tests to apply to see the range-bound nature of other exceptions is the Jarque-Bera test, self-named after two 20th century economists wrestling with the issue of how to approximate the limits of higher-order moments relative to that of a normal distribution.
The formula is simpler to use versus the Wilks’ test, and approaches a χ2 distribution with 2 degrees of freedom (or heavily right skewed).
(n-k+1)/6*[Skew2 – ¼(Kurtosis-3)2]
One can examine articles here on the higher-order moments, such as skew and kurtosis. A search of this blog for example shows this great reference on this theory for the small-sample approximation of the normal distribution, the Student’s t, named after a fledgling 20th century statistician unable to publish by his employer (Guinness brewery in Dublin) using his real name. And –unquestionably- at this point in the article you might be pouring yourself a well-earned glass Guinness. But this is where we’ll show some important mathematical properties you won’t see elsewhere about the theoretical relationship between the normal trajectory of lump sum risk and the application of Jarque-Bera test for the normal distribution. This distribution, having a skew of 0 and a kurtosis of 3 (excess kurtosis of 0), results in a Jarque-Bera lower bound of 0. We can assume the k degrees of freedom concept will be minimal for the simulation:
~ n/6*[02 – ¼(0)2]
To understand how high the absolute level of the expected return needs to be (in relation to the underlying standard deviation) for the normally distributed returns multiplied or divided to also equal a normal distribution, we run a simulation with a high n size per graphic data, of thousands. The result is we see that within the range shown, the return needs to be nearly twice the value of the standard deviation. See position (C) on the illustration below. Not typically the case, though it the convex function does reflect that for very high or for very low reward to risk ratios, one can be more certain in the forward risk distribution of their outcomes. See position (B) on the illustration below for a representative area of a high risk to absolute reward ratio.
Now we’ve covered a variety of different applications of lump sum and annuity math, and we see that even here that there are outcome differences not subject to quick and lazy “rules of thumb”. For example, the lump sum model originally looked like a good reflection of summing the normal distribution, but here we see that it is still good, but for the random walk we are geometrically multiplying or diving risks. And annuity-dues were bad applications for the normal distribution, but here it can be fine depending on either of two situations. The first is where we have fixed risk (see the increasing or decreasing annuities), or second here in the advanced random walk example only when there is a high return/risk ratio. This is a lot initially to keep track of! But it shows the beautiful and complicated nature of nature again, and how less polished people would unknowingly sweep over these details with detrimental guidance.
What we further see from the simulations illustrated above is that closer these reward to risk ratios are to 0, the more uncertain we are in describing the future risk distribution of lifetime performance outcomes. The dark blue dots, which we often see in the addition and subtraction diagram, reflects a Jarque-Bera of <3. Or mostly uniform degree of normalcy in the result distribution, as is theoretically the case. The green dots on the other hands are for Jarque-Bera of between 3 and 10. The remaining space is the most non-normal range of dots; those are in red.
Staying with Jarque-Bera for our article, let’s make sure to thoroughly explore the important differences and similarities of using Jarque-Bera versus other normal tests. The key here for the approximation is that the Jarque-Bera only uses 2 high-order statistics, which may be best for small data sets but it still does not consume any other information concerning the shape of the distribution beyond those two moments. Note that we already have the Tchebysheff’s inequality (named after 19th century Russian mathematician Пафну́тий Льво́вич Чебышёв) for a lower order test using variance (or even σ). Other broader tests include Wilks’. But again here a larger dataset would be needed to compensate for the non-parametric nature of this test.
We should also note that in a number of other blog articles here we have explored in depth other important abnormal estimations and manipulations. Convolution theory is one of those stimulating applications, particularly in risk modeling. These examples (here, here, here, here) are generally leptokurtic, though that is a lesser point when dealing with the development of risk where growth factors are involved, as they are here with most of our annuity and lump sum examples.
As we noted before with the annuity example, open-form simulations and advanced analytics are not always best. It doesn’t train one’s brain to think outside the box. This is another form of looking at the something incorrect over and over, and rationalizing that as correct. One should instead strive to see things differently if they hope to ever be prepared for the “unexpected”. During this recent financial crisis we see examples of how people were caught off-guard by homogenously thinking with the herds, and then seeing risk unfold for the second time in a decade. That got Lehman Brothers’ Fuld into trouble (before we rescued a number of similar institutions with TARP) in ways he still doesn’t know (here, here). Having more nuanced approaches allows one to better think “around the corner” and similar to a Bonsai planting be cautious in applying just the right amount of modeling attention where needed (not too much and not too little).
A fifth and final example here is where we have an annuity type of model that adjusts (e.g., increasing or decreasing) and the risk disturbances over time are also changing in scale. Such cases would lead to extreme models where we have abnormal risks that must be modeled through a Monte Carlo (a probability model like some of the earliest probability applications invented for the purposes of considering schemes used in the southern France gambling halls) type of simulation. On a related note, one can enjoy this sophisticated gambler’s ruin answer here or here. However we can imagine that normality may exist on the fringes still if the volatility is still kept low! Think about that since it isn’t intuitive and yet it’s a powerful insight to consider. One can think about the distributions in life from accumulated fallen tree-leave patterns, or the distribution of passengers among train cars, or how much risk is encumbered in a small amount of space.
Similar to how we see above the chaotic assembly in which the leaves were generated and then where they have fallen all over (though mostly nearer the middle), the philosophical concept that in life we’ll never see something that is different this time, is neither always true nor always false. Contrary, our life experiences will always depend on knowing the context we personally come across, since life patterns always unfold in ways perhaps different from the last. And to parents’ delights everywhere, some of such understanding also only occurs from aging.
The meaning of “different” in expecting something different is also a source of vagary. Risk is different from non-risk, but it’s also different from other risks we’ve seen in the past. Risks do not always unfold in predictable ways (in terms of frequency, severity, distribution shape, or in the way the trend develops). One can see Chapter 4 of Statistics Topics, concerning stochastic modeling.
Surely there will be many times where future repetitions will continue to result in generally similar results. But we should always also expect that unknown non-normal risks would occur, which are more difficult to properly model and protect against. The future will not simply always just even itself out. To be clear, the location of the (B) and (C) in the diagram is where we need to understand where we currently are in the financial markets. It is not insane again here, therefore, to not expect different results based on context, while being cautionary towards those who claim to offer such well risk-adjusted opportunities.
So it may only be insanity after all to be either overly fearful or overly confident that life occurrences will be different “next time” in some sort of predictable way!
Lastly, in other news: There are many updates worth noting. The market call that was made in our last article, and published in aWSJ network’s MarketWatch column, came true to the day with what we noted would be a 1.3% S&P drop to 2103. And the article before that on the “big data” numbers behind facial recognition was featured by many news, helping catapult my face(s) recently to the top of the viral #HowOldRobot.
Additionally, I have just joined the board of FutureAdvisor, alongside colleagues such as Yale’s Barry Nalebuff and former Stanford endowment’s CIO. Also in addition to board advisory work on both coasts, have just announced that all proceeds from my bestselling bookStatistics Topics have been donated to the American Statistical Association. We will continue to work bringing first-class and free (even of 3rd party commercials) probability and statistics education, to anyone who wants to benefit from same. Lastly, this blog has gained a new readership milestone to be proud of, crossing >1/3 million reads on the site, and >1 million reads across all sites. Here below we can again enjoy some of the current, top decile articles that were written on this blog that received notable attention from leading professionals around the globe. Thanks much for the support; what a ride!
Source: Statistical Ideas, written by Salil Mehta