Trading the Breaking

Trading the Breaking

Research

[WITH CODE] Model: RIDRA rule-based algorithms

Master drift detection an rule expansion before volatility redefines your strategy

Apr 09, 2025
∙ Paid

Table of contents:

  1. Introduction.

  2. Incremental decision rules and RIDRA.

  3. Algorithmic foundations.

    1. Drift detection using ADWIN.

    2. Rule expansion with gain ratio.

    3. Aging mechanism.

  4. RIDRA algorith implementation.

  5. Pitfalls when using this algorithm.


Before you begin, remember that you have an index with the newsletter content organized by clicking on “Read full story” in this image.


Introduction

The stock market is a wild, caffeine-fueled mad guy, and your trading algorithm is clinging on for dear life. One minute it’s soaring on bullish euphoria, the next it’s plunging into bearish despair—all while dodging data streams moving faster than a Reddit meme stock. In this crazy days, most algorithms panic like a rookie day trader during a margin call. But what if your system could adapt faster than a caffeinated squirrel hoarding acorns in a bull market?

A while back, in an old community I don't want to remember, I shared the most basic implementation of this incremental rule extraction model. I'll probably regret this, but oh well... here goes, it'll be fun!

Incremental decision rules and RIDRA

Decision rules are attractive for algorithmic trading due to their interpretability—they allow traders to understand which conditions—or if–then rules—lead to a trading signal. However, many rule-based methods are static. The RIDRA algorithm was originally proposed to address this gap by incrementally generating and updating decision rules as new data arrive. In this version, we’ll integrate an aging mechanism and use improved rule expansion with gain ratio to select attributes that are most informative for the task.

The original RIDRA algorithm was presented with several versions in the literature. Instead this implementation is built on the reactive version RIDRA, adapting it to the financial domain. We use methods like the Hoeffding bound to decide when enough evidence exists to update a rule and bucket-based aggregation in ADWIN to improve drift detection.

If you are curious, you can find more information about the IDRA—without R—algorithm in this document.

Incremental Decision Rules Algorithm
1.09MB ∙ PDF file
Download
Download

Before diving into our method, let’s review some of the mathematics underlying our approach. In the context of statistical learning and drift detection, several key equations appear repeatedly. For instance, the Hoeffding bound gives us an upper bound on the probability that the sum of random variables deviates from its expected value:

\(\epsilon(n, \delta) = \sqrt{\frac{1}{2n} \ln\left(\frac{4}{\delta}\right)} \)

where n is the number of observations and δ is the confidence parameter. In our ADWIN algorithm, we use a similar concept to decide whether to drop older data from our window when a significant change is detected.

Another important metric is the gain ratio used for rule expansion. The information gain for an attribute A is given by:

\(IG(S, A) = H(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} H(S_v) \)

where H(S) is the entropy of the dataset S and ​Sv is the subset of S for which attribute A takes the value vvv. The gain ratio is then computed as:

\(\text{Gain Ratio}(S, A) = \frac{IG(S, A)}{SI(A)} \)

with SI(A) being the split information defined by:

\(SI(A) = -\sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \ln\left(\frac{|S_v|}{|S|}\right) \)

While these equations might sound intimidating, I promise that the final implementation is written in plain NumPy—and yes, even non-mathematicians can follow along 😁

Algorithmic foundations

In a non-stationary financial market, abrupt changes can signal regime shifts or emerging trends. ADWIN or Adaptive Windowing detects these changes by maintaining a window of recent observations and checking if a statistically significant difference exists between two subwindows—there are other ways to do this, but I'll leave that part for you to explore.

Drift detection using ADWIN

The idea is simple: if the mean of the first part of the window differs too much from the mean of the second part, we conclude that a drift has occurred.

In this version, we adopt a bucket-based approach. Instead of scanning every possible split—which is computationally expensive—we aggregate observations into buckets. Each bucket Bi​ is represented as a tuple (wi, si) where wiw_iwi​ is the weight or count and si​ is the sum of values in that bucket. The cumulative sum up to a bucket boundary lets us compute the mean quickly:

\(\mu_L = \frac{\sum_{i=1}^{k} s_i}{\sum_{i=1}^{k} w_i}, \quad \mu_R = \frac{\sum_{i=k+1}^{m} s_i}{\sum_{i=k+1}^{m} w_i} \)

for a chosen cut between buckets k and k+1. Then, using the Hoeffding bound, we test whether:

\(|\mu_L - \mu_R| > \epsilon_L + \epsilon_R \)

with

\(\epsilon_L = \sqrt{\frac{1}{2W_L} \ln\left(\frac{4}{\delta}\right)} \quad \text{and} \quad \epsilon_R = \sqrt{\frac{1}{2W_R} \ln\left(\frac{4}{\delta}\right)} \)

where WL​ and WR​ are the total weights of the left and right partitions, respectively.

Rule expansion with gain ratio

The second critical component of our algorithm is the expansion of decision rules. In trading, each rule represents a trading signal—for example, “if the moving average is rising and volatility is low, then buy.” The challenge is to determine which attributes to add to the rule to best predict future outcomes.

We use the gain ratio for attribute selection—when you play around with this, test Mutual Information as well. Intuitively, an attribute that reduces the uncertainty—entropy—in the target variable is a good candidate for rule expansion. By calculating the gain ratio for each candidate attribute and choosing the highest, we ensure that our rules are both effective and generalizable.

The gain ratio is defined as:

\(\text{Gain Ratio}(S, A) = \frac{\text{Information Gain}(S, A)}{\text{SplitInfo}(S, A)}\)

where

\(\text{Information Gain}(S, A) = H(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|}H(S_v)\)

and

\(\text{SplitInfo}(S, A) = -\sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \log_2\left(\frac{|S_v|}{|S|}\right) \)

Here,

  • H(S) is the entropy of the set S.

  • Sv​ is the subset of Sfor which attribute A takes the value v.

  • ∣S∣ is the total number of instances in S.

Aging mechanism

Financial markets are notorious for “forgetting” past conditions almost as quickly as they form. To mimic this behavior, we incorporate an aging factor in our RIDRA algorithm. Every time a new instance is processed, the support (or importance) of existing rules decays by a factor γ (with 0<γ≤1). In our experiments, we set γ=0.99so that recent events weigh more heavily in the model. Mathematically, if st​ is the support at time t, then at time t+1:

\(s_{t+1} = \gamma \cdot s_t + \text{(increment if instance is covered)} \)

This simple yet effective idea ensures that the algorithm “forgets” outdated rules and remains responsive to new trends.

Okay! Let’s start little by little to implement all the clases. Follow me! 🎮🎮

RIDRA algorith implementation

The full script is based on our earlier version but now includes a bucket-based ADWIN, an aging mechanism, and an improved rule expansion procedure. So let’s begin with the ADWIN method.

 import numpy as np

class ADWIN:
    def __init__(self, delta=0.002, min_total_weight=10):
        self.delta = delta
        self.buckets = []  # List of buckets: each bucket is (weight, sum)
        self.change_detected_flag = False
        self.min_total_weight = min_total_weight

    def _merge_buckets(self):
        i = len(self.buckets) - 1
        while i > 0:
            if self.buckets[i][0] == self.buckets[i-1][0]:
                weight = self.buckets[i][0] + self.buckets[i-1][0]
                summation = self.buckets[i][1] + self.buckets[i-1][1]
                self.buckets[i-1] = (weight, summation)
                del self.buckets[i]
                i = len(self.buckets)
            i -= 1

    def add_element(self, value):
        self.buckets.append((1, value))
        self._merge_buckets()

        total_weight = sum(bucket[0] for bucket in self.buckets)
        if total_weight < self.min_total_weight:
            self.change_detected_flag = False
            return

        total_sum = sum(bucket[1] for bucket in self.buckets)
        cumulative_weight = 0
        cumulative_sum = 0
        drift_found = False
        for idx, (w, s) in enumerate(self.buckets):
            cumulative_weight += w
            cumulative_sum += s
            if cumulative_weight < self.min_total_weight or cumulative_weight == total_weight:
                continue
            mean_left = cumulative_sum / cumulative_weight
            weight_right = total_weight - cumulative_weight
            sum_right = total_sum - cumulative_sum
            mean_right = sum_right / weight_right

            eps_left = np.sqrt((1/(2*cumulative_weight)) * np.log(4/self.delta))
            eps_right = np.sqrt((1/(2*weight_right)) * np.log(4/self.delta))
            eps = eps_left + eps_right

            if np.abs(mean_left - mean_right) > eps:
                self.buckets = self.buckets[idx+1:]
                drift_found = True
                break

        self.change_detected_flag = drift_found

    def detected_change(self):
        return self.change_detected_flag

In our ADWIN class, buckets group incoming values. We merge adjacent buckets with the same weight to keep the structure compact. When a new value arrives, we check for a significant difference in the means computed at bucket boundaries. If a drift is detected, we remove older buckets.

Now it’s the time for the rule method. Feel free to improve it, for now we'll use something basic to understand it properly.

class Rule:
    def __init__(self, antecedent=None, consequent=None, support=0.0):
        self.antecedent = antecedent if antecedent is not None else {}
        self.consequent = consequent if consequent is not None else {}
        self.support = support

    def __repr__(self):
        return f"Rule({self.antecedent}, support={self.support:.2f}, consequent={self.consequent})"

This simple class represents a decision rule. The antecedent is a set of conditions—e.g., “if price > moving average”—the consequent indicates the prediction—e.g., “buy” with some probability—and support tracks how often the rule is activated. Think of support as the rule’s popularity—like a trending hashtag on social media, but for trading signals!

Now, it’s time to incorporate the RIDRA class.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Quant Beckman · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture