[WITH CODE] Evaluation: Metrics for your systems
Metrics will help you understand how good your trading strategy is
Table of contents:
Introduction.
Basic statistical functions.
Performance metrics.
Risk-adjusted metrics.
Statistical properties of returns.
Risk measures.
Benchmark comparison metrics.
Drawdown metrics.
Trade-based metrics.
Introduction
Returns are just vanity metrics until you dissect the risk engine that drives them. The Sharpe ratio isn't an academic trophy; it's the blood pressure gauge for volatility-adjusted returns. Maximum drawdown? That's your strategy's breaking point when markets soar like they did in 2008.
And friends, if your backtest collapses under the 10-sigma volatility spikes of March 2020 or a liquidity crunch from a hawkish Fed pivot, you're not trading, you're donating.
Quantitative analysts obsess over left-tail events hidden in the noise. Their alpha is just beta in disguise if it can't survive rate hikes, flash crashes, or a Black Swan that chokes the repo market. Remember: markets don't care about your elegant code. They break strategies that confuse backtest luck with structural advantage.
Today we begin by analyzing a basic overview of any trading system. Without this, you won't succeed.
Basic statistical functions
Before diving into individual metrics, we need to establish the statistical building blocks that we will use to compute averages, dispersion, and percentiles.
Mean
The arithmetic mean of a dataset {x1,x2,…,xn} is defined as:
The mean is the central value of the data. It gives a first-order summary and is used in nearly every metric.
def my_mean(data):
"""Calculate the mean of a list of numbers."""
if not data:
return 0
return sum(data) / len(data)
Standard deviation
The population standard deviation measures the dispersion of data around the mean. Its formula is:
A higher standard deviation indicates that data points are spread out over a wider range of values, which is particularly important in risk assessment.
def my_std(data):
"""Calculate the population standard deviation of a list of numbers."""
if not data:
return 0
m = my_mean(data)
variance = sum((x - m) ** 2 for x in data) / len(data)
return math.sqrt(variance)
Percentiles
They are used to determine the relative standing of a value in a dataset. The pppth percentile is computed by sorting the data and interpolating between the nearest ranks.
For example, the 5th percentile—which is used in VaR calculations—tells us the value below which 5% of the data fall.
def my_percentile(data, percentile):
"""
Compute the percentile of a list of numbers using linear interpolation.
"""
if not data:
return None
sorted_data = sorted(data)
n = len(sorted_data)
pos = (percentile / 100) * (n - 1)
lower = math.floor(pos)
upper = math.ceil(pos)
if lower == upper:
return sorted_data[int(pos)]
lower_value = sorted_data[lower]
upper_value = sorted_data[upper]
weight = pos - lower
return lower_value + weight * (upper_value - lower_value)
Perfect! Now that we have these concepts clear, let's move on to a little more interesting things.
Oh! By the way, if you're interested in these topics, don't miss what's coming soon related to all this. Start here; it'll get interesting soon. It will get wild soon!🔥🔥
Performance metrics
Performance metrics help you understand how well your trading strategy grows your capital.
Cumulative return
The cumulative return is defined as:
This metric measures the overall growth of your portfolio over the period. If you start with $100,000 and finish with $150,000, your cumulative return is 50%.
def cumulative_return(equity_initial, equity_final):
"""Calculate the cumulative return as a percentage."""
return ((equity_final / equity_initial) - 1) * 100
Compound annual growth rate (CAGR)
CAGR is defined as:
where N is the number of years.
CAGR provides an annualized growth rate that smooths out fluctuations, giving you a single number to compare across different time frames. It’s like finding the average speed of a car over a long journey, regardless of the stops and starts.
def calculate_cagr(equity_initial, equity_final, years):
"""Calculate the Compound Annual Growth Rate (CAGR) as a decimal."""
if equity_initial <= 0 or years <= 0:
return None
return (equity_final / equity_initial) ** (1 / years) - 1
Risk-adjusted metrics
These metrics incorporate both returns and the risk taken to achieve them. They help answer the question: Is the extra return worth the extra risk?
Sharpe ratio
It computes the excess return for each day, averages them, and divides by the volatility of these returns.
E(R) is the expected return.
Rf is the risk-free rate.
σ is the standard deviation—volatility—of returns.
The Sharpe Ratio measures how much excess return you are receiving for the extra volatility that you endure for holding a riskier asset.
def sharpe_ratio(returns, risk_free_rate):
"""
Calculate the Sharpe Ratio using excess returns.
"""
# Compute excess returns for each period.
excess_returns = [r - risk_free_rate for r in returns]
# Compute the mean of the excess returns.
mean_excess = my_mean(excess_returns)
# Compute the standard deviation of the excess returns.
std_excess = my_std(excess_returns)
# Avoid division by zero.
if std_excess == 0:
return float('inf')
return mean_excess / std_excess
Sortino ratio
In this ratio, we isolate only the days with losses, compute their variability, and then see how the overall excess return compares to that risk.
where σd is the standard deviation of the negative returns—downside deviation.
Unlike the Sharpe Ratio, which penalizes both upside and downside volatility, the Sortino Ratio focuses only on harmful volatility—the volatility caused by negative returns.
def sortino_ratio(returns, risk_free_rate):
"""
Calculate the Sortino Ratio considering only negative deviations.
"""
# Compute excess returns.
excess_returns = [r - risk_free_rate for r in returns]
# Filter for negative excess returns (downside risk).
downside_returns = [x for x in excess_returns if x < 0]
# Compute the downside standard deviation.
downside_std = my_std(downside_returns) if downside_returns else 1e-10
# Return the ratio.
return my_mean(excess_returns) / downside_std
Omega ratio
I like this one and not only because of its name but for its elegance.
Beautiful! It compares the total gains to the total losses, giving an idea of whether your winning days are compensating for your losing days.
def omega_ratio(returns):
"""
Calculate the Omega Ratio: ratio of sum of positive returns to sum of absolute negative returns.
"""
sum_positive = sum(r for r in returns if r > 0)
sum_negative = sum(abs(r) for r in returns if r < 0)
if sum_negative == 0:
return float('inf')
return sum_positive / sum_negative
Statistical properties of returns
Annualized volatility
Daily volatility σdaily is scaled to an annual figure by:
where T is the number of trading days per year—typically 252.
Annualized volatility provides a standardized measure to compare the risk of different strategies regardless of the time frame.
def annualized_volatility(returns, trading_days=252):
"""
Calculate annualized volatility based on daily returns.
"""
daily_vol = my_std(returns)
return daily_vol * math.sqrt(trading_days)
Skewness
Skewness quantifies the asymmetry of the return distribution:
where μ is the mean return and σ is the standard deviation.
A positive skew indicates a long right tail—more extreme positive returns—whereas a negative skew indicates a long left tail—more extreme losses.
def skewness(returns):
"""
Calculate skewness of the returns.
"""
m = my_mean(returns)
s = my_std(returns)
n = len(returns)
if n == 0 or s == 0:
return 0
skew_val = sum((r - m) ** 3 for r in returns) / n
return skew_val / (s ** 3)
Kurtosis
Excess kurtosis is defined as:
Excess kurtosis measures the tailedness of the distribution relative to a normal distribution. Positive excess kurtosis indicates fat tails, while negative kurtosis indicates thinner tails.
def kurtosis(returns):
"""
Calculate excess kurtosis of the returns.
"""
m = my_mean(returns)
s = my_std(returns)
n = len(returns)
if n == 0 or s == 0:
return 0
kurt = sum((r - m) ** 4 for r in returns) / n
return kurt / (s ** 4) - 3
Risk measures
Risk measures such as Value at Risk (VaR) and Conditional Value at Risk (CVaR) quantify potential losses under adverse market conditions.
Value at Risk (VaR)
Although I stopped using this metric years ago, if you're just starting out, it's worth knowing about.
For a confidence level of 95%, VaR is the 5th percentile of the return distribution:
VaR answers the question: What is the worst loss that will not be exceeded with 95% confidence over a specified period?
def var_metric(returns, confidence_level=0.95):
"""
Calculate Value at Risk (VaR) at the given confidence level.
"""
percentile_value = (1 - confidence_level) * 100
return my_percentile(returns, percentile_value)
Conditional Value at Risk (CVaR)
CVaR (or Expected Shortfall) is defined as the average loss given that losses exceed the VaR:
CVaR provides insight into the tail risk by averaging the worst-case losses, offering a more complete picture than VaR alone.
def cvar_metric(returns, confidence_level=0.95):
"""
Calculate Conditional Value at Risk (CVaR).
"""
var_val = var_metric(returns, confidence_level)
tail_losses = [r for r in returns if r <= var_val]
if tail_losses:
return my_mean(tail_losses)
else:
return var_val
Benchmark comparison metrics
Now, let's talk about the interesting stuff! These metrics help distinguish between market movement and the strategy’s unique performance.
Beta
Beta is computed as:
Beta measures the sensitivity of a strategy’s returns to market movements. A beta of 1 means the strategy moves in lockstep with the benchmark; above 1 implies amplified movements; below 1 indicates dampened movements.
def covariance(x, y):
"""Calculate the covariance between two lists."""
if len(x) != len(y) or not x:
return 0
m_x = my_mean(x)
m_y = my_mean(y)
return sum((xi - m_x) * (yi - m_y) for xi, yi in zip(x, y)) / len(x)
def calculate_beta(strategy_returns, benchmark_returns):
"""
Calculate beta as covariance(strategy, benchmark) / variance(benchmark).
"""
var_benchmark = my_std(benchmark_returns) ** 2
if var_benchmark == 0:
return float('nan')
return covariance(strategy_returns, benchmark_returns) / var_benchmark
R-squared
R-squared is defined as:
R-squared indicates the proportion of variance in the strategy’s returns that is explained by the benchmark returns. A high R2 means most of the strategy’s movement is driven by market factors.
def r_squared(strategy_returns, benchmark_returns):
"""
Calculate R-squared from the correlation of strategy and benchmark returns.
"""
cov = covariance(strategy_returns, benchmark_returns)
std_strategy = my_std(strategy_returns)
std_benchmark = my_std(benchmark_returns)
if std_strategy == 0 or std_benchmark == 0:
return 0
corr = cov / (std_strategy * std_benchmark)
return corr ** 2
Information ratio
It measures the excess return of the strategy over the benchmark relative to the volatility of that excess return.
A higher ratio means the strategy consistently outperforms the benchmark with lower dispersion in its outperformance.
def information_ratio(strategy_returns, benchmark_returns):
"""
Calculate the Information Ratio.
"""
active_returns = [s - b for s, b in zip(strategy_returns, benchmark_returns)]
mean_active = my_mean(active_returns)
std_active = my_std(active_returns)
if std_active == 0:
return float('inf')
return mean_active / std_active
Treynor ratio
The Treynor Ratio measures the excess return earned per unit of systematic risk—as measured by beta.
It’s especially useful when comparing strategies that are highly correlated with the market.
def treynor_ratio(strategy_return, risk_free_rate, beta):
"""
Calculate the Treynor Ratio.
"""
if beta == 0 or math.isnan(beta):
return float('nan')
return (strategy_return - risk_free_rate) / beta
Drawdown metrics
Drawdown metrics reveal how much a strategy can lose from its peak and how long it may remain below peak levels.
Maximum drawdown
This measures the worst-case loss from a peak to a trough, expressed as a percentage. It’s crucial for understanding the potential risk of a strategy.
Keep it down!
def max_drawdown(equity_curve):
"""
Calculate the maximum drawdown from an equity curve.
Returns the maximum drawdown as a negative decimal.
"""
peak = equity_curve[0]
max_dd = 0
for equity in equity_curve:
if equity > peak:
peak = equity
drawdown = (equity - peak) / peak
if drawdown < max_dd:
max_dd = drawdown
return max_dd
Longest drawdown days
This metric counts the maximum number of consecutive periods during which the portfolio remains below its previous peak.
A long drawdown period might indicate extended stress for an investor, even if the eventual loss is moderate.
def longest_drawdown_days(equity_curve):
"""
Calculate the longest consecutive period (in days) of drawdown.
"""
peak = equity_curve[0]
longest = 0
current = 0
for equity in equity_curve:
if equity < peak:
current += 1
else:
current = 0
peak = equity
longest = max(longest, current)
return longest
If the equity curve remains under its previous peak for 5 consecutive days at its worst, the longest drawdown days is 5.
Average drawdown percentage
his metric gives an average measure of all the declines experienced during drawdown periods.
This metric gives an average measure of all the declines experienced during drawdown periods.
def average_drawdown_percentage(equity_curve):
"""
Calculate the average drawdown percentage during drawdown periods.
"""
peak = equity_curve[0]
drawdowns = []
for equity in equity_curve:
if equity > peak:
peak = equity
dd = (equity - peak) / peak
if dd < 0:
drawdowns.append(dd)
return my_mean(drawdowns) * 100 if drawdowns else 0
If the equity curve has several drawdown periods with an average decline of 5%, then the average drawdown percentage is 5%.
Trade-based metrics
These metrics focus on the outcomes of individual trades, providing a granular view of strategy performance.
Profit factor
A profit factor greater than 1 indicates that total gains exceed total losses—but 1 is not enough. It’s a quick indicator of overall profitability.
You want a PF ≈ 2 or bigger.
def profit_factor(trade_results):
"""
Calculate the Profit Factor: ratio of total winning trade profit to total absolute losses.
"""
sum_wins = sum(trade for trade in trade_results if trade > 0)
sum_losses = sum(abs(trade) for trade in trade_results if trade < 0)
if sum_losses == 0:
return float('inf')
return sum_wins / sum_losses
Awal ratio
This ratio compares the average profit of winning trades to the average loss of losing trades. A higher ratio means that when you win, you win by much more than you lose when you’re wrong.
Just like with the PF you want a AR ≈ 2 or bigger.
def awal_ratio(trade_results):
"""
Calculate the Awal Ratio: ratio of average win to average loss.
"""
wins = [trade for trade in trade_results if trade > 0]
losses = [trade for trade in trade_results if trade < 0]
if not losses:
return float('inf')
avg_win = my_mean(wins) if wins else 0
avg_loss = my_mean(losses) if losses else 0
if avg_loss == 0:
return float('inf')
return avg_win / abs(avg_loss)
Expectancy
Expectancy tells you the average outcome per trade.
It combines the probability of winning with the average win and the probability of losing with the average loss.
def calculate_expectancy(trade_results):
"""
Calculate the expectancy per trade.
"""
wins = [trade for trade in trade_results if trade > 0]
losses = [trade for trade in trade_results if trade < 0]
win_prob = len(wins) / len(trade_results) if trade_results else 0
avg_win = my_mean(wins) if wins else 0
avg_loss = my_mean(losses) if losses else 0
return win_prob * avg_win + (1 - win_prob) * avg_loss
Rina index
The Rina Index relates the average profit per trade (expectancy) to the worst drawdown experienced.
A higher Rina Index indicates that the typical trade’s profit is robust compared to the worst-case loss scenario.
def rina_index(trade_results, max_drawdown):
"""
Calculate the Rina Index.
"""
exp = calculate_expectancy(trade_results)
if max_drawdown == 0:
return float('inf')
return exp / abs(max_drawdown)
Average trade
Simply the arithmetic mean of all trade outcomes.
It gives an overall sense of the profit—or loss—per trade.
def average_trade(trade_results):
"""Calculate the average trade result."""
return my_mean(trade_results)
Winning percentage
This metric tells you the proportion of trades that were profitable.
Simple as that!
def winning_percentage(trade_results):
"""Calculate the percentage of winning trades."""
wins = len([trade for trade in trade_results if trade > 0])
total = len(trade_results)
return (wins / total * 100) if total > 0 else 0
Time under water percentage
It indicates the proportion of time that the portfolio’s equity is below its previous peak.
def time_under_water_percentage(equity_curve):
"""
Calculate the percentage of time the equity curve is below its historical peak.
"""
peak = equity_curve[0]
under_water = 0
for equity in equity_curve:
if equity < peak:
under_water += 1
else:
peak = equity
return (under_water / len(equity_curve)) * 100
Maximum consecutive loss
This metric counts the largest number of successive losing trades. Understanding the longest losing streak is essential for stress testing and risk management.
def max_consecutive_loss(trade_results):
"""Calculate the maximum number of consecutive losing trades."""
max_losses = 0
current_losses = 0
for trade in trade_results:
if trade < 0:
current_losses += 1
max_losses = max(max_losses, current_losses)
else:
current_losses = 0
return max_losses
Standard deviation of trades
his measures how spread out the individual trade outcomes are relative to the average.
It gives insight into the consistency of the trading performance.
def std_dev_trades(trade_results):
"""Calculate the standard deviation of trade outcomes."""
return my_std(trade_results)
Until next time—may your trades be as precise as a masterstroke, your signals as clear as dawn’s first light, and your returns as bold as a bull’s charge. Navigate the markets with poise, resilience, and an unshakable edge 🚀
PS: What’s your biggest challenge in learning or staying up to date on algo trading?