Trading the Breaking

Trading the Breaking

Research

[WITH CODE] Testing: Synthetic scenarios

Learn how to generate realistic financial time series that mimic market behavior while maintaining flexibility.

Mar 05, 2025
∙ Paid

Table of contents:

  1. Introduction.

  2. Preliminaries and notation.

  3. Input data and preprocessing.

  4. Cholesky decomposition.

  5. Generation of random returns.

  6. Incorporating the original data via convex combination.

  7. Implementation of the method.


Before you begin, remember that you have an index with the newsletter content organized by clicking on “Read full story” in this image.


Introduction

Anyone who has been talking to me for a while knows that there is one field that I have explored quite a bit. I am referring to synthetic data. Yes, I know, the same one. Why? Imagine you’re a Hollywood director. Your mission? Shoot a sequel to The Matrix that’s faithful to the original’s vibe but with fresh chaos. Synthetic time series are your CGI: they mimic real-market dynamics while letting you control the chaos. In other words, you can create specific scenarios with laboratory conditions.

In many scientific fields—from econometrics to engineering—it is often necessary to simulate time series data that replicate the statistical dependencies found in real data. In our field as well, because synthetic data is used for:

  • Stress testing.

  • Risk assessment.

  • Validating statistical models.

The method I share with you today is especially suited for generating sequences that mimic both the volatility and trends of historical data while preserving their empirical correlation structure.

The approach uses several key mathematical tools:

  • Empirical correlation matrix captures the relationships between variables.

  • Cholesky decomposition transforms independent random draws into correlated returns.

  • A multiplicative compounding process models the dynamics of time series.

  • Finally, a convex combination of the synthetic series and the original data allows for a controlled balance between historical fidelity and randomness.

Here a sample of what we are going to build:

Before starting with the preliminaries, remember to subscribe! 😊

Preliminaries and notation

Before discussing the method in depth, we introduce some essential notation and definitions:

  • Data matrix:
    Let

    \(D \in \mathbb{R}^{n \times m}\)

    be the matrix extracted from the assets data, where n represents the number of observations—typically time steps—and m the number of features—in our case determined by the method after feeding it with 1D array.

  • Empirical correlation matrix:
    The correlation matrix C is computed as

    \(C = \operatorname{corr}(D)\)

    with each element

    \(c_{ij} = \frac{\operatorname{cov}(D_i, D_j)}{\sigma_{D_i} \sigma_{D_j}},\)

    where σDi​​ is the standard deviation of the i-th column and cov⁡(Di,Dj) the covariance between columns i and j.

  • Cholesky decomposition:
    Given that C is symmetric and positive semidefinite, we factorize it as

    \(C = L L^\top,\)

    where L is a lower triangular matrix with positive diagonal entries.

  • Random matrix generation:
    We generate a matrix Z with independent standard normal entries:

    \(Z \sim \mathcal{N}(0, 1).\)

    Scaling is applied via a factor σ—denoted as nivel_var in the code—to adjust volatility:

    \(\tilde{Z} = \sigma Z\)
  • Synthetic series construction:
    Starting with an initial value X0—extracted from the original data—the series evolves as

    \(X_t = X_0 \prod_{i=1}^{t} \bigl(1 + r_i\bigr),\)

    where ri are the generated returns.

  • Convex combination:
    Finally, the synthetic series is combined with the original series via a convex combination parameterized by α—here called factor_correlacion—:

    \(Y_t = \alpha \, D_{t,1} + (1-\alpha) \, X_t.\)

With these preliminaries in place, we now examine each step of the method and its interrelations.

Input data and preprocessing

The process begins with data provided in a numpy array referred to here as assets. The numerical values are extracted to form the matrix

\(D \in \mathbb{R}^{n \times m},\)

where each row represents a time step and each column a variable. Besides, here, D contains asset prices.

Once the matrix D is constructed from assets, the next step is to compute the empirical correlation matrix

\( \operatorname{corr}(D).\)

This matrix quantifies the linear relationships between each pair of variables. Because the diagonal elements are 1 and the matrix is symmetric, it is well-suited for further decomposition.

Preserving the correlation structure is vital. In financial applications, for instance, the interdependence between asset returns can determine portfolio risk. By incorporating C into the simulation, we ensure that the synthetic data replicate these relationships.

Cholesky decomposition

The Cholesky decomposition factorizes a symmetric positive definite matrix C into the product of a lower triangular matrix L and its transpose:

\(C = L L^\top.\)

This decomposition is central because it allows us to convert independent random variables into correlated ones. If z is a vector of independent standard normal variables, then

\(x = L z\)

will have covariance structure given by

\(\operatorname{Cov}(x) = L \operatorname{Cov}(z) L^\top = L I L^\top = L L^\top = C.\)

But then, how do we derive the Cholesky decomposition?

Let C be an m×m symmetric positive definite matrix. The goal is to find a lower triangular matrix L such that

\(C = L L^\top.\)

Where L:

\(L = \begin{pmatrix} l_{11} & 0 & \cdots & 0 \\ l_{21} & l_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{m1} & l_{m2} & \cdots & l_{mm} \end{pmatrix}. \)

The entries of L are computed recursively as follows:

  • Diagonal entries:
    For each i=1,2,…,m, the i-th diagonal element lii​ is given by:

    \(l_{ii} = \sqrt{C_{ii} - \sum_{k=1}^{i-1} l_{ik}^2}.\)

    Since C is a correlation matrix, we have Cii​=1 for all i. Thus, the formula simplifies to:

    \(l_{ii} = \sqrt{1 - \sum_{k=1}^{i-1} l_{ik}^2}.\)
  • Off-diagonal entries:
    For i>j, the off-diagonal element lii​ is computed as:

    \(l_{ij} = \frac{1}{l_{jj}} \left( C_{ij} - \sum_{k=1}^{j-1} l_{ik} l_{jk} \right).\)

    This recursive process continues for each i and j with i>j.

Because C is positive definite, all lii​ computed in this way are positive, ensuring that the decomposition is unique.

At that point you may ask yourself: Why we do that? What is the role in inducing correlations?

The Cholesky decomposition is used to impose the empirical correlation structure on a set of independent random variables. Suppose

\(z\in \mathbb{R}^m\)

is a vector of independent standard normal random variables. Then, by computing

\(x = L z\)

the resulting vector x has a covariance matrix C. This transformation is crucial for our simulation: it converts a matrix of independent noise into correlated returns that reflect the observed dependencies among the assets.

Generation of random returns

We begin by generating an n×m matrix Z with independent entries sampled from N(0,1). This matrix is then scaled by a factor σ—as I mentioned before—to adjust the variance:

\(\tilde{Z} = \sigma Z.\)

And to impose the empirical correlation structure on the random data, we multiply the scaled matrix by the transpose of the Cholesky factor:

\(R = \frac{\tilde{Z}\,L^\top}{100}.\)

The division by 100 normalizes the returns to a percentage scale. Each row of R represents correlated returns for a single time step.

The synthetic time series is generated through a cumulative product. Starting with an initial value X0, the series evolves as:

\(X_t = X_0 \prod_{i=1}^{t} \left(1 + r_i\right).\)

This multiplicative approach mirrors the compounding of returns observed in asset price dynamics

Incorporating the original data via convex combination

While the synthetic series Xt​ captures the stochastic variability, it may be beneficial to retain some of the trends and patterns of the original data. A convex combination is used to blend the original series Dt,1​ with the synthetic series:

\(Y_t = \alpha\, D_{t,1} + (1 - \alpha)\, X_t,\)

where α controls the degree of adherence to the original data.

This convex combination ensures that the final series Yt ​lies within the range defined by the original and synthetic series, effectively regularizing the simulation. It provides a mechanism to control both the variance and the mean behavior of the final output.

And here is the gem, generating multiple synthetic series allows for Monte Carlo simulation techniques, which are used to evaluate uncertainty and risk associated with various scenarios. By repeating the simulation process k times—denoted as num_secuencias—we obtain an ensemble:

\(\{Y^{(1)}_t, Y^{(2)}_t, \ldots, Y^{(k)}_t\}.\)

Send me a copy

Implementation of the method

The heart of the function is the loop that generates each synthetic sequence. For every iteration:

  1. Random data generation: A matrix of random numbers is generated and scaled by nivel_var. These numbers are our raw, uncorrelated returns.

rand_data = np.random.normal(size=data_matrix.shape) * nivel_var
  1. Imposing correlation: The random data is then multiplied by the transpose of the Cholesky factor, introducing the empirical correlation structure.

sim_corr_rets = np.matmul(rand_data, chol.T) / 100
  1. Synthetic series via cumulative product: The synthetic series is computed by taking the cumulative product of (1+r), starting from the first value of the original series.

serie_sintetica = primera_fila * np.cumprod(1 + sim_corr_rets[:, 0])
  1. Convex combination with original data: The final synthetic series is obtained by blending the original series with the synthetic series using a convex combination.

serie_correlacionada = (factor_correlacion * data_matrix[:, 0] + (1 - factor_correlacion) * serie_sintetica)

Each generated series is then appended to our list of synthetic series.

Okay! Time to play and test the whole method 🤓

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Quant Beckman · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture