Portfolio optimization should provide large benefits to investors, but standard mean-variance optimization (MVO) works so poorly in practice that optimization is often abandoned. (Page 1)

We identify the portfolios that cause problems in standard MVO and present a simple enhanced portfolio optimization (EPO) method. Applying EPO to industry momentum and time series momentum across equities and global asset classes, we find significant alpha beyond the market, the 1/N portfolio, and standard asset pricing factors. (Page 1)

Investors seek to construct portfolios that optimally trade off risk and expected return. A standard tool to achieve this goal is mean-variance optimization (MVO) (Markowitz 1952), but MVO often produces large and unintuitive bets that perform poorly in practice (Michaud 1989). (Page 2)

Likewise, standard academic factors that bet on such characteristics as value (HML), size (SMB), and momentum (UMD) are constructed without the use of optimization or, in fact, any use of volatility or correlation information (e.g., the factor models of Fama and French 1993, 2015). (Page 2)

This paper seeks to demystify optimization by addressing these questions. In short, we show (1) where the problem with standard optimization arises, (2) how to fix it in a simple way, (3) how the fix explains and unifies a number of enhanced optimization methods in the literature, and (4) that the fix works surprisingly well. (Page 2)

Finally, we find empirically that the EPO method improves industry momentum and time series momentum performance in an economically and statistically significant way relative to standard benchmarks. For example, the EPO time series momentum portfolio in global equities, bonds, currencies, and commodities shows a large improvement in Sharpe ratio and statistically significant alpha relative to equal-notional-weighted or equal-volatility-weighted time-series momentum portfolios. (Page 2)

To understand the poor performance of standard MVO, consider how optimization works in practice. An investor first identifies the securities that she likes and dislikes, or, said differently, estimates securities’ expected returns. Then she estimates securities’ risks (volatilities and correlations). All these estimates naturally have measurement errors, which can lead MVO to take large unintuitive bets that work poorly in practice. (Page 3)

Further, we show that increasing the ex-ante volatilities of the problem portfolios is exactly the same as shrinking correlations of the original assets toward zero! Thus, correlation shrinkage directly reduces the estimated Sharpe ratios of the problem portfolios. (Page 3)

This method is what we call the “simple EPO”. The simple EPO first shrinks all correlations toward zero and then computes the standard MVO portfolio. The two key insights are: (i) that correlation shrinkage can fix both errors in risk and expected return, and (ii) this can be achieved by choosing the shrinkage parameter to maximize the portfolio’s Sharpe ratio (out of sample), in contrast to the existing literature that chooses correlation shrinkage to maximize the fit of the correlation (or variance-covariance) matrix. (Page 3)

This insight – the power of tuning correlation shrinkage to maximize risk-adjusted returns, not just risk – has deeper theoretical foundations based on Bayesian estimation and robust optimization. (Page 4)

In addition to unifying these approaches, a key contribution is to explain why these methods work, namely because they shrink correlations, which fixes the problem portfolios. (Page 4)

To see how the simple EPO works in practice, consider a shrinkage parameter 𝑤𝑤 ∈ [0,1]. First, the off-diagonal correlation Ω𝑖𝑖𝑖𝑖 between any pair of assets 𝑖𝑖 and 𝑗𝑗 is replaced by (1 − 𝑤𝑤)Ω𝑖𝑖𝑖𝑖, and then we perform MVO using this modified variance-covariance matrix. (Page 4)

How much shrinkage is needed? The simple answer is that this is an empirical question. We empirically choose 𝑤𝑤 out-of-sample as follows: each time period, we estimate what choice of 𝑤𝑤 would have produced the highest SR in the time period up until today, and then use this estimate in the next time period. (Page 4)

To “fix” the correlation matrix (i.e., to fix errors in the risk model alone), we typically need to shrink the correlation matrix only about 5% to 10%. So why do we need a much larger shrinkage of around 75%? (Page 5)

We also develop a more general form of EPO, which allows the investor to control how close the solution stays to an “anchor portfolio.” For example, an investor benchmarked to a certain stock index may desire to control how much his optimized portfolio deviates from this benchmark, hence using the benchmark as an anchor. (Page 5)

Empirically, we apply our EPO method to optimize momentum portfolios using several realistic data sets, showing that EPO produces significant performance gains relative to standard benchmarks in the literature. When applied to a universe of global equity indices, bonds, currencies, and commodities, the EPO time series momentum portfolio substantially outperforms several benchmarks that are known to be difficult to beat. Indeed, EPO outperforms 1/N portfolios, equal-notional-weighted time series momentum factors, equal-volatility-weighted time series momentum, standard MVO, and MVO with enhanced risk models. (Page 5)

Furthermore, in the context of equity industry portfolios, the EPO industry momentum portfolio significantly outperforms the market portfolio, 1/N portfolios, standard MVO, MVO with an enhanced risk model, and standard industry momentum. (Page 5)

Despite the fame of this paper, it remains mysterious to many readers who find it difficult to apply and difficult to understand where the result is coming from, including what is being assumed and what the parameters mean. (Page 6)

Third, we link our approach to the literature on robust optimization (see the survey by Fabozzi, Huang, and Zhou 2010 and references therein) by showing how to solve a problem with a general “ellipsoidal uncertainty” set on the mean, and by showing, perhaps surprisingly, the exact equivalence between this form of robust optimization and the Bayesian estimator. (Page 6)

Fifth, our empirical results extend and enhance standard factor models, in particular industry momentum (Moskowitz and Grinblatt, 1999) and time series momentum (Moskowitz, Ooi, Pedersen, 2012). See also Baltas (2015), Yang, Qian, and Belton (2019), and Baltas and Kosowski (2020) for other enhancements of time series momentum based on risk parity methods. (Page 7)

We first lay out the standard portfolio choice framework and then show how to identify problem portfolios. The appendix contains a summary of our notation. (Page 7)

For now, we assume that the investor ignores potential noise in the signal. Further, rather than considering an abstract signal, we assume for simplicity that the signal is already scaled to be the conditional expected excess return, that is, 𝛼𝛼 = 𝑠𝑠. Similarly, the investor computes the conditional variance-covariance matrix of excess returns, Σ = var(𝑟𝑟|𝑠𝑠). (Page 7)

We first show how the “problem portfolios” for standard MVO can be identified using principal components of the correlation matrix. (Page 8)

Focusing on the correlation matrix essentially means that we first scale all the original assets to have equal volatility (but we could also use the variance-covariance matrix itself). (Page 8)

By way of background on principal components, we note that the first principal component maximizes the function ℎ′Ωℎ subject to ℎ′ ℎ = 1. In other words, it maximizes the variance ℎ′Ωℎ of any portfolio ℎ (in the space of assets that have been scaled to unit volatility, given that we are working with the correlation matrix instead of variance-covariance matrix). (Page 8)

The last principal components are exactly those portfolios that potentially give trouble to the standard mean-variance optimization. These portfolios have, by definition, the smallest possible variance among all portfolios (relative to their sum of squared portfolio weights), but not necessarily a small magnitude of estimated expected returns. In other words, for these portfolios, the noise can easily swamp the signal and, what is worse, standard MVO tends to take large leveraged bets on these noise-driven portfolios. (Page 8)

The least important principal components are those with the lowest volatilities, �𝐷𝐷𝑖𝑖. Any error in the estimation of risk will likely lead to an underestimation of the risk of these portfolios (because they have been chosen as the lowest-risk portfolios). (Page 9)

Having identified the problem with MVO, the solution is straightforward: we need to increase the estimated risk of the problem portfolios, which can be achieved by shrinking estimated correlations of assets, leading to the simple EPO as shown in Section A below. (Page 10)

Having identified the problem with MVO, the solution is straightforward: we need to increase the estimated risk of the problem portfolios, which can be achieved by shrinking estimated correlations of assets, leading to the simple EPO as shown in Section A below. (Page 10)

As discussed above, principal components can be viewed as portfolios that are ordered by their degree of troublesomeness for portfolio optimization. In essence, the problem is that the estimated variances are likely to be too low for the safest portfolios (and too high for the riskiest ones). (Page 10)

Observation: Adjusting the volatilities of PC portfolios corresponds to adjusting the correlations of the original assets. Specifically, increasing the volatility of problem portfolios while lowering the volatility of the important PC portfolios is the same as multiplying all the correlations of the original assets by 1 − � (Page 10)

We next address that the investor’s signal 𝑠𝑠 is observed with noise. This section considers a Bayesian approach following Black and Litterman (1992), although with a different way of expressing the solution (and different notation). (Page 11)

The investor must try to estimate the true expected return 𝜇𝜇 based on this noisy signal 𝑠𝑠. While standard MVO estimates the true expected return simply as the signal 𝑠𝑠 that contains measurement errors, we consider instead a Bayesian investor who updates her “prior beliefs” about 𝜇𝜇 to make a better estimate of true expected returns using the observed signal, that is, 𝐸𝐸(𝜇𝜇|𝑠𝑠). T (Page 11)

Intuitively, this model means that the investor is aware that her signal is estimated with error and has a framework for the nature of this error. (Page 12)

To explain the mysterious parameters, the anchor portfolio is basically the investor’s typical portfolio or strategic asset allocation, 𝜏𝜏 indicates the variation in the investor’s optimal portfolio, and Λ is the amount of measurement error. (Page 12)

We also consider an “anchored EPO”, which makes all the mysterious parameters disappear, except the anchor – since having an anchor can be useful in practice, e.g., to control how much an optimized portfolio deviates from a benchmark. (Page 12)

This result shows how robust optimization can be done via shrinkage of the mean and variance-covariance matrices. Surprisingly, the optimal portfolio (12) is exactly the same as the solution in Section II.B! This result provides a new link between robust optimization and Bayesian optimization. What is the intuition behind this link? Both methods capture the ideas that the signal 𝑠𝑠 contains imperfect information about the conditional expected returns, that the amount of noise in the signal is related to Λ, and that there exists an anchor portfolio 𝑎𝑎 that one might not want to deviate too much from. (Page 14)

We have discussed above that estimation errors occur in both the variance-covariance matrix and in expected returns. Hence, we first fix the problem with the variance-covariance matrix using simple shrinkage as in Section II.A (or using random matrix theory discussed in appendix), giving rise to the enhanced risk estimate Σ�, and, second, enhance expected returns as described in Sections II.B-C, leading to the general EPO solution: (Page 14)

The general EPO solution in (13) depends on several parameters, some of which are straightforward to estimate, while others are more tricky, so it is useful to provide some guidance on the tricky ones. Let us start with the easier ones: The variance-covariance matrix, Σ�, can be estimated in the standard way based on the sample counterpart, possibly enhanced with shrinkage as discussed in Section II.A. (Page 14)

The EPO shrinkage parameter. The shrinkage parameter 𝑤𝑤 plays a key role in our empirical implementation. We see from (14)-(15) that the EPO shrinkage parameter controls the shrinkage of both (i) expected returns toward the anchor, and (ii) the correlations toward zero. (Page 15)

Simple EPO. A particularly simple expression arises if we choose the anchor portfolio as 𝑎𝑎 = 1𝛾𝛾 𝑉𝑉−1𝑠𝑠. In this case, we recover the simple EPO already discussed in Section II.A: (Page 15)

Anchored EPO. Some investors prefer their portfolio to be tied to an anchor, so it is useful to consider a practical implementation of an anchored EPO. For example, an investor may have a signal 𝑠𝑠 about the assets’ expected returns based on their momentum, and an anchor 𝑎𝑎 based on the 1/N portfolio or based on benchmark portfolio. (Page 16)

Our empirical implementation constructs optimized industry momentum and time series momentum portfolios using 11 different samples that differ in terms of their test assets and methodology as summarized in Table 1. (Page 18)

Test assets and data. Our data for Global 1-3 consists of 55 liquid futures and forwards described in Moskowitz, Ooi, and Pedersen (2012). Specifically, we include every equity, commodity, and bond futures contract used in Moskowitz, Ooi, and Pedersen (2012), as well as the nine currency pairs that involve the US dollar (USD). (Page 18)

The samples for Equity 1-7 are the 49 value-weighted US equity industry portfolios from Ken French’s website. Equity 8 splits each industry portfolio into two components, for a total of 2 × 49 = 98 test assets. Specifically, using the CRSP data on the underlying stocks, we compute a “high-momentum” and “low-momentum” portfolio within each of these 49 industry portfolios. (Page 19)

Performance of EPO vs. Benchmark Portfolios. Turning to our empirical results, we first consider the performance of optimized TSMOM portfolios relative to key benchmarks for global assets such as long-only portfolios and standard TSMOM factor portfolios, as shown in Table 2. The first portfolio that we consider is the 1/N portfolio which invests an equal notional exposure across all assets. (Page 22)

The risk-weighted TSMOM factor already has a very high SR since it already does several of the things that an optimizer can hope to achieve: it takes into account expected returns by trading on TSMOM; and, it takes into account volatility differences across assets and over time by scaling positions accordingly. (Page 22)

. Looking at the realized volatilities of these portfolios, we see that the realized returns are also decreasing in the PC number with volatility levels that roughly match their average ex ante counterparts, reflecting that the risk model works reasonably well. However, we do see systematic errors: the least important PCs (those with highest numbers) have higher realized volatilities than their average ex ante volatility. (Page 24)

We next consider principal component returns, plotted in Figure 1, Panel B. Naturally, realized returns are noisy while expected returns are smoother, simply because realized performance always has an element of chance. (Page 24)

We see that the alpha remains statistically significant. Column 3 then controls for volatility-adjusted TSMOM strategies in each of the four asset classes to see if EPO statically exploits a different asset allocation strategy. This is a stringent test since we are now controlling for 5 high-performance volatility-adjusted strategies that already implicitly do part of the job that we hope that an optimizer would do. (Page 25)

In all cases, we see that the out-of-sample EPO portfolio outperforms 1/N, INDMOM, and the standard MVO portfolio, often by a substantial margin. This robustness of the results is noteworthy given the range of specifications. Recall that Equity 1-3 vary the risk model from 40 days to 60 months, a broad span of risk models. Equity 4 and 5 consider different ways to scale of the signals about expected returns. Equity 6 and 7 consider different implementations of the EPO method, using the anchored EPO rather than the simple EPO, while considering different anchors. (Page 27)

For Equity 2-4, the t-statistic is above 6, which is highly statistically significant. We note that the weaker risk-adjusted return of Equity 6 may arise due to the fact that, in this specification, the EPO is anchored to the long-only 1/N portfolio, which creates two issues: (1) a large market loading of 0.85, and (2) a tradeoff (in the choice of the shrinkage parameter) between stabilizing the optimization and moving toward a long-only portfolio, rather than an INDMOM portfolio, which does not exploit signals about expected returns. (Page 27)

Single extra input, namely a correlation shrinkage parameter, which is chosen to maximize risk-adjusted returns in past data. EPO improves portfolio performance by accounting for noise in the investor’s estimates of risk and expected return. The method encompasses several optimization procedures in the literature – notably Black-Litterman, robust optimization, and regularization methods used in machine learning – so it demystifies, unifies, and simplifies much of this literature. (Page 28)

Despite the simplicity, EPO delivers powerful results empirically. Applying our EPO method to several realistic examples, we see surprisingly large performance improvements in optimized industry momentum and time series momentum portfolios relative to standard benchmarks and predictors in the literature. When applied to global assets, our EPO time series momentum portfolio substantially outperforms the market portfolio and the 1/N portfolio and even relatively sophisticated benchmarks that are already perform substantially better than the 1/N portfolio. Indeed, the EPO method delivers significant alpha even relative to volatility-scaled long-only and standard time series momentum portfolios. These sophisticated benchmarks already deliver high Sharpe ratios since they exploit the lowest hanging fruits of optimization by (1) using information about expected returns, (2) controlling for volatility differences across assets, and across time, (3) potentially exploiting market risk premia and risk parity effects, and (4) potentially re-adjusting asset-class weights. This is a tough benchmark to beat, yet EPO beats it. (Page 28)

Further, the performance enhancements are robust to range of different specifications. While we focus on momentum predictors for simplicity, future research can use this approach to enhance other predictors. (Page 28)