Trading the Breaking

Trading the Breaking

Alpha Lab

[WITH CODE] Portfolio: Robust covariance estimation

How to keep your portfolio balanced even when markets misbehave

๐š€๐šž๐šŠ๐š—๐š ๐™ฑ๐šŽ๐šŒ๐š”๐š–๐šŠ๐š—'s avatar
๐š€๐šž๐šŠ๐š—๐š ๐™ฑ๐šŽ๐šŒ๐š”๐š–๐šŠ๐š—
Apr 03, 2025
โˆ™ Paid
7
3
Share

Table of contents:

  1. Introduction.

  2. Classical approach.

    1. Why the classical approach is called โ€œclassicalโ€?

  3. Understanding the Mahalanobis Distance.

  4. Improving computational methods.

    1. An iterative reweighting scheme.

  5. Further than traditional enhancements.

    1. Robust covariance estimation via iterative reweighting.

    2. Shrinkage for additional stability.


Introduction

Itโ€™s saturday night, youโ€™re at a lively party where everyone has a story to share. Most guests follow a predictable rhythm, but then thereโ€™s that one personโ€”loud, unpredictable, and totally off-beat.

In statistics, such a guest is known as an outlier, and its presence can throw off your entire analysis. In traditional quantitative finance and risk management, the covariance matrix plays a starring role. It tells us about the relationships between different assets or variables, much like understanding who at the party pairs well with whom. However, the classical estimation methods, while elegant under ideal conditions, quickly lose their charm when faced with noisy, messy data.

Letโ€™s take a closer look and check if we can enhance this method a little bit.

Classical approach

The concept of the covariance matrix is foundational in portfolio optimization. When dealing with multivariate dataโ€”observations that have multiple components or dimensionsโ€”understanding how different variables co-vary is critical. This is precisely what the covariance matrix encapsulates: it describes not only how each individual variable spreads outโ€”its varianceโ€”but also how each pair of variables co-variesโ€”their covariance.

In a typical scenario, you might have a random vector xโˆˆRp. This vector has p components, for example, p different stock returns, or p measurements taken on a patient in a clinical trial. The population covariance matrix of x, denoted by ฮฃ, is defined as:

\(\boldsymbol{\Sigma} = \mathbb{E} \left[ (\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T \right],\)

where ฮผ=E[x] is the population mean vector. Of course, in real-world applications, the true mean ฮผ and the true covariance ฮฃ are almost never known. Instead, we collect a sample of observations and use them to estimate these quantities. This is where the classical approach to covariance estimation comes into play.

Suppose we have n observations of the random vector x. We label these observations as x1,x2,โ€ฆ,xnโ€‹, where each xiโˆˆRp. The classical sample covariance matrix S is defined by:

\(\mathbf{S} = \frac{1}{n - 1} \sum_{i=1}^{n} (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T,\)

with

\(\bar{\mathbf{x}} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i\)

being the sample mean vector. This definition might look compact, but it encodes several important properties:

  1. Centering by the sample mean: We first subtract the sample mean from each observation. This step ensures that the estimated covariance matrix reflects the variability around the center of the data rather than around an arbitrary origin.

  2. Outer product structure: Each term is a pร—p matrix capturing the pairwise product of deviations from the mean for all dimensions. Summing these outer products accumulates information about how variables vary and co-vary.

  3. Division by nโˆ’1: Dividing by nโˆ’1 (rather than n) is crucial for making S an unbiased estimator of the population covariance ฮฃ. This subtle differenceโ€”using nโˆ’1 instead of nโ€”comes from Besselโ€™s correction, a concept that ensures that E[S]=ฮฃ.

Letโ€™s generate some synthetic bivariate data and compute the classical covariance matrix:

import numpy as np

# Generate synthetic data: 100 observations in 2 dimensions
np.random.seed(42)
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], size=100)

# Compute the classical covariance matrix
classical_cov = np.cov(data, rowvar=False)
print("Classical Covariance Matrix:")
print(classical_cov)

The classical covariance estimator works beautifully when the data behaves nicely. But financial data is anything but nice data.

This post is for paid subscribers

Already a paid subscriber? Sign in
ยฉ 2025 Quant Beckman
Privacy โˆ™ Terms โˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture