Trading the Breaking

Trading the Breaking

Share this post

Trading the Breaking
Trading the Breaking
[WITH CODE] Data: The criteria you need for market data
Alpha Lab

[WITH CODE] Data: The criteria you need for market data

Choose data that aligns with your trading strategy and time horizon

𝚀𝚞𝚊𝚗𝚝 𝙱𝚎𝚌𝚔𝚖𝚊𝚗's avatar
𝚀𝚞𝚊𝚗𝚝 𝙱𝚎𝚌𝚔𝚖𝚊𝚗
Mar 20, 2025
∙ Paid
12

Share this post

Trading the Breaking
Trading the Breaking
[WITH CODE] Data: The criteria you need for market data
3
Share

Table of contents:

  1. Introduction.

  2. The market data supply chain and data quality criteria.

  3. Quantifying data accuracy.

  4. Assessing granularity.

  5. Latency optimization.

  6. Data completeness and integration.

  7. Matrix-based optimization.


Introduction

Let’s face it: most trading strategies fail not because the math was wrong, but because the data was garbage. Imagine spending weeks building a Ferrari of an algorithm, only to fuel it with yesterday’s lawnmower gas. It’ll sputter, stall, and leave you stranded on the highway of regret.

Choosing data that aligns with your trading strategy and time horizon, while meeting quality, granularity, and low-latency criteria, is crucial for optimizing algorithm performance in dynamic markets.

Let us begin in the world of data-driven algorithmic trading!

The market data supply chain and data quality criteria

The journey of market data from its genesis to its utilization can be broken down into several stages. Each stage adds value and transforms raw data into a more refined product ready for consumption by traders and algorithmic systems.

  1. Exchanges: The primary source of market data, where every tick, trade, and order book update is generated. Here, the data is raw and unprocessed—akin to unrefined ore that must be mined for gold.

  2. Hosting providers & ticker plants: These intermediaries collect data from various exchanges and perform normalization. Normalization standardizes different data formats so that downstream systems can process them uniformly. This stage reduces the engineering burden on quants, who otherwise might need to wrangle disparate data formats.

  3. Feed providers: After normalization, feed providers distribute the data via APIs. These feeds supply real-time or near-real-time market data to traders, thus enabling rapid decision-making in high-speed trading environments.

  4. OMS/EMS software providers: Finally, the data is used by Order Management Systems and Execution Management Systems to assist in executing trades and managing orders. At this stage, the data is processed, user-friendly, and often enriched with additional analytics.

The transformation process can be summarized as:

\(\text{Raw Data} \rightarrow \text{Normalized data} \rightarrow \text{API feed} \rightarrow \text{Processed for trading systems}\)

This journey ensures that the data fed into trading algorithms is accurate, timely, and easily digestible.

But what about the quality? Do we need some kind of criteria?

The answer is yes. Indeed the following criteria must be met:

  • Accuracy: The data must be free of errors. Mathematically, if we let D represent the dataset and ϵ be the error term, we require

    \(\epsilon = 0 \quad \text{or at least} \quad |\epsilon| < \delta\)

    where δ is a small tolerance level.

  • Granularity: Data should provide sufficient detail for the strategy. For example, a high-frequency trading algorithm requires tick-level granularity. We can express granularity G in terms of the number of data points per unit time:

    \(G=\frac{N}{T}\)

    where N is the number of data points and T is the time period.

  • Latency: The delay between data generation and reception must be minimized. If L denotes latency, our goal is to achieve L→0.

  • Field completeness: The dataset must include all relevant fields—price, volume, bid-ask spreads, etc. We define a completeness vector c where each element represents the availability of a required field:

    \(\mathbf{c} \in \{0,1\}^k,\)

    and we require that the sum of elements equals k—the total number of fields.

  • Integration and compatibility: The data should integrate smoothly with the trading system’s architecture. This can be modeled as a function I(D,S) that measures the integration quality between data D and system S. Our target is to maximize I.

Most retail traders don't have specific criteria. They drift, settling for whatever's available and free. Even many small and medium-sized firms, at best, purchase data from a recognized data provider, and whatever happens... a little cleaning and off they go!

Quantifying data accuracy

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Quant Beckman
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share