← Back to Skills Marketplace

Algo Builder

Name: Algo Builder
Author: tonbistudio

by tonbi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

314

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install algo-builder

Description

Build and test systematic daily-to-weekly trading algorithms through hypothesis, signal validation, backtesting, and strategy development steps.

README (SKILL.md)

Algo Builder Skill

Use this skill when the user wants to build a trading algorithm, develop a trading strategy, test a signal, backtest a strategy, research alpha, validate a trading hypothesis, or structure a systematic trading process.

Scope

This pipeline is optimized for daily-to-weekly frequency strategies (swing trading, factor investing, systematic macro, crypto position trading).

Does NOT apply to HFT (millisecond to minute): HFT requires order book simulation, tick data, latency modeling, and queue position analytics. If the user is building HFT, redirect them to those specialized tools.

For intraday strategies (1min to 4hr): use intraday IC horizons, model bid-ask explicitly, consider volume or dollar bars instead of time bars.

The Correct Order of Operations

Most people start at backtesting. That is wrong. Backtesting is a confirmation tool, not a discovery tool — as Marcos Lopez de Prado puts it: "Backtesting is not a research tool. Feature importance is." Starting at backtesting causes overfitting, wasted time, and false confidence.

The correct pipeline:

HYPOTHESIS → SIGNAL TEST → STRATEGY CONSTRUCTION → IN-SAMPLE BACKTEST → WALK-FORWARD → PAPER TRADE → LIVE

Never skip steps. Never go backward except intentionally.

Step 1: Hypothesis

Before any code, write the hypothesis in plain English.

Prompt the user to answer:

What market inefficiency are you exploiting?
Why does this edge exist? What is the behavioral or structural reason?
Who is on the other side of this trade and why are they consistently wrong?
What market conditions would break this edge?
What is your expected holding period?

If the user cannot answer all five, the hypothesis is not ready. Do not proceed.

Write the hypothesis into a file: algo-builder/hypotheses/\x3Cstrategy-name>.md

Template:

## Hypothesis: \x3Cname>
**Edge:** \x3Cone sentence>
**Mechanism:** \x3Cwhy the edge exists>
**Counterparty:** \x3Cwho is wrong and why>
**Regime dependency:** \x3Cwhen it breaks>
**Holding period:** \x3Ctimeframe>
**Trials so far:** 0  (increment each time you test a variation — needed for DSR in Step 5)
**Status:** [HYPOTHESIS | SIGNAL_TESTED | BACKTESTED | LIVE]

Step 2: Signal/Feature Testing

Test each signal component IN ISOLATION before combining them into a strategy. This is the most important and most skipped step.

What to measure

Information Coefficient (IC) IC = Spearman rank correlation between signal value and forward return over N periods. Use Spearman (not Pearson) — it is robust to outliers and works for non-linear rank relationships.

Calibrated thresholds (per Grinold & Kahn, academic literature, and practitioner consensus):

IC > 0.15: Exceptional — verify data quality first, suspect snooping or data error
IC 0.05–0.15: Strong — pursue aggressively; professional-grade in liquid markets
IC 0.02–0.05: Moderate — viable in ensemble; common in efficient US equity markets
IC 0.01–0.02: Weak — only viable at scale (100+ assets, high breadth)
IC \x3C 0.01: No edge — discard
IC \x3C 0 (negative): signal may be inversely useful — investigate before discarding

Note: In crypto (less efficient than equities), IC thresholds can be higher. In highly liquid large-cap equities, IC 0.03 is genuinely good. Calibrate to your asset class.

IC Information Ratio (ICIR) — equally important as IC itself ICIR = mean(IC) / std(IC) over rolling windows. Measures consistency of the signal over time.

ICIR > 1.0: Highly stable — strong independent evidence of real edge
ICIR 0.5–1.0: Acceptable stability — proceed
ICIR \x3C 0.5: Unstable — even a decent IC here is unreliable; do not proceed

A signal with IC 0.08 but ICIR 0.3 is more dangerous than a signal with IC 0.04 and ICIR 0.9. Consistency matters as much as magnitude.

IC Stability (Rolling) Plot rolling IC over time. Consistent positive IC across multiple market regimes is required. IC that only appears in one regime has conditional validity — you must then condition entry on that regime.

Decay Analysis Test IC at 1-bar, 3-bar, 5-bar, 10-bar, 20-bar forward horizons. Does the signal's predictive power decay gracefully or cliff-edge? This reveals the natural holding period.

T-stat on IC IC must be statistically significant: t-stat > 2.0 (p \x3C 0.05) over the test period.

Regime Conditioning Test IC separately in different market regimes (bull/bear, high-vol/low-vol). A signal with IC 0.06 in all regimes is far better than IC 0.12 only in bull markets. Document regime dependency clearly.

Simple regime proxies:

# Price above/below 200-day MA
def bull_bear_regime(prices):
    ma200 = prices.rolling(200).mean()
    return (prices > ma200).astype(int)  # 1=bull, 0=bear

# VIX-based volatility regime
def vol_regime(vix):
    return pd.cut(vix, bins=[0, 15, 25, 100], labels=['low', 'medium', 'high'])

Signal testing template

Ask the user for:

Signal name and definition (exact calculation)
Asset and data source
Test period
Forward return horizon to test
How many other signals have been tested so far (needed for multiple testing correction)

Write scripts/test_signal.py:

import pandas as pd
import numpy as np
from scipy import stats

def compute_ic(signal: pd.Series, forward_returns: pd.Series) -> dict:
    """Compute Information Coefficient using Spearman rank correlation."""
    combined = pd.concat([signal, forward_returns], axis=1).dropna()
    signal_col, return_col = combined.columns

    ic, pvalue = stats.spearmanr(combined[signal_col], combined[return_col])
    t_stat = ic * np.sqrt((len(combined) - 2) / (1 - ic**2))

    return {
        "IC": round(ic, 4),
        "p_value": round(pvalue, 4),
        "t_stat": round(t_stat, 4),
        "n_obs": len(combined),
        "verdict": _verdict(ic, pvalue)
    }

def _verdict(ic: float, pvalue: float) -> str:
    if pvalue > 0.05:
        return "REJECT: Not statistically significant"
    if ic \x3C 0.01:
        return "REJECT: IC too low, no edge"
    if ic > 0.15:
        return "FLAG: Exceptional IC — verify data quality before proceeding"
    if 0.01 \x3C= ic \x3C 0.02:
        return "WEAK: Viable only at high breadth/scale"
    if 0.02 \x3C= ic \x3C 0.05:
        return "MODERATE: Viable in ensemble"
    if 0.05 \x3C= ic \x3C 0.15:
        return "STRONG: Pursue this signal"
    return "STRONG: Pursue this signal"

def compute_icir(rolling_ic: pd.Series) -> dict:
    """Compute IC Information Ratio from rolling IC series."""
    icir = rolling_ic.mean() / rolling_ic.std()
    return {
        "ICIR": round(icir, 4),
        "mean_IC": round(rolling_ic.mean(), 4),
        "std_IC": round(rolling_ic.std(), 4),
        "verdict": "STABLE" if icir > 0.5 else "UNSTABLE — do not proceed even with good IC"
    }

def test_ic_decay(signal: pd.Series, prices: pd.Series, horizons=[1,3,5,10,20]) -> pd.DataFrame:
    """Test IC at multiple forward horizons to find natural holding period."""
    results = []
    for h in horizons:
        fwd_ret = prices.pct_change(h).shift(-h)
        result = compute_ic(signal, fwd_ret)
        result["horizon"] = h
        results.append(result)
    return pd.DataFrame(results).set_index("horizon")

def test_ic_stability(signal: pd.Series, forward_returns: pd.Series, window=52) -> pd.Series:
    """Rolling IC to check stability over time."""
    rolling_ic = []
    for i in range(window, len(signal)):
        s = signal.iloc[i-window:i]
        r = forward_returns.iloc[i-window:i]
        ic, _ = stats.spearmanr(s.dropna(), r.dropna())
        rolling_ic.append(ic)
    return pd.Series(rolling_ic, index=signal.index[window:])

def test_ic_by_regime(signal, forward_returns, regime_indicator, regime_labels):
    """Test IC conditional on market regime."""
    results = {}
    for regime_id, regime_name in regime_labels.items():
        mask = regime_indicator == regime_id
        s = signal[mask]
        r = forward_returns[mask]
        if len(s) > 20:
            ic, pval = stats.spearmanr(s.dropna(), r[s.notna()])
            results[regime_name] = {
                "IC": round(ic, 4),
                "p_value": round(pval, 4),
                "n_obs": len(s.dropna())
            }
    return results

Gate

A signal passes to Step 3 ONLY IF ALL of the following:

IC is statistically significant (p \x3C 0.05)
IC > 0.02 (adjust higher for less efficient markets like crypto)
ICIR > 0.5 (stability is required, not optional)
IC positive in at least 55% of rolling windows
IC holds in at least one market regime with sufficient observations

Document results in algo-builder/signals/\x3Csignal-name>.md.

Multiple testing note: If this is the Nth signal tested, flag it. The more signals tested, the higher the probability that a passing signal is a false positive. Track trial count in the hypothesis file. This feeds into DSR in Step 5.

Step 3: Strategy Construction

Combine validated signals into a complete strategy. Define each component explicitly before writing backtest code.

Required components

Entry logic

Which signals trigger entry?
How are they combined (AND, OR, weighted score)?
Minimum signal threshold to enter?
Regime condition required? (if signal only works in one regime, entry must be conditioned on that regime)

Exit logic (define ALL of these — do not leave any as "figure it out later")

Signal-based exit: when signal reverses
Time-based exit: max hold period
Stop loss: fixed % or ATR-based
Take profit: fixed target or trailing

Position sizing

Fixed fractional (% of portfolio per trade)
Kelly Criterion (if you have reliable win rate and avg win/loss)
ATR-normalized (size inversely proportional to volatility)
Never: "all in" or undefined sizing

Risk controls (separate from signal logic, always on)

Max position size per asset
Max correlated positions open simultaneously
Daily loss limit (halt trading if breached)
Max drawdown circuit breaker

Transaction Cost Budget (quantify before backtesting — not a checkbox)

Specify:

Expected gross edge (rough estimate from IC × √Breadth approximation)
Commission: __ bps
Bid-ask spread: __ bps
Market impact estimate: use 8bps × (position_size / ADV)^0.5 as a rough rule
Total TC: __ bps

Red flags:

Total TC > 30% of gross alpha: strategy likely not viable after costs
Turnover > 500%/year at daily frequency: TC will likely destroy the edge
Position > 1% of ADV: model market impact explicitly, don't use the rough rule

Capacity check

What is the average daily volume of the instruments traded?
At what AUM does market impact exceed signal value?
Small-cap and crypto strategies can have very low capacity ceilings

Write the full strategy specification into algo-builder/strategies/\x3Cname>.md before writing any backtest code.

Step 4: In-Sample Backtest

Split your data: 70% train, 30% test. Seal the test set. Do not touch it.

Run the backtest on the TRAIN set only.

Minimum acceptance criteria (in-sample)

Sharpe Ratio > 1.0 (> 1.5 preferred)
Max Drawdown within stated tolerance
Profit Factor > 1.3
Win Rate > 40% (or Expectancy clearly positive)
Minimum 30 trades (for statistical significance)

If the strategy fails these, go back to Step 3. Max 3 parameter adjustment cycles before reconsidering the hypothesis. Each adjustment cycle increments your trial count — this matters for DSR in Step 5.

Bias checklist (check all before accepting results)

No look-ahead bias: signals computed only from data available at bar close
Survivorship bias addressed: include delisted assets if testing equities
Transaction costs included with realistic estimates (not just a checkbox — see Step 3 TC budget)
Liquidity check: can you actually fill at these prices given your size?
No data-snooping: have you looked at the test set at all? (honest answer required)
For crypto: wash trading filtered from volume-based signals

Step 5: Walk-Forward Validation + Overfitting Correction

This is what separates a real strategy from a backtest artifact.

Walk-Forward Method

Define in-sample window (e.g., 6 months)
Define out-of-sample window (e.g., 1 month)
Optimize parameters on in-sample window
Test on out-of-sample window (no reoptimization)
Roll forward. Repeat across full history.

Acceptance criteria:

Out-of-sample Sharpe > 0.7 (degradation from in-sample is expected and normal)
Out-of-sample / In-sample Sharpe ratio > 0.5
Out-of-sample profitable in > 60% of windows
No cliff-edge degradation in recent windows (suggests regime change or alpha decay)

If walk-forward fails: the strategy is overfit. Return to Step 3.

Deflated Sharpe Ratio (DSR) — Multiple Testing Correction

Walk-forward gives you ONE path through history. The DSR asks: given how many variations you tested to get here, how likely is this Sharpe to be real?

After only 1,000 independent backtests, the expected maximum Sharpe is ~3.26 even if the true SR of every strategy is zero (Bailey & Lopez de Prado, 2013). This is the #1 cause of false discoveries in systematic trading.

def deflated_sharpe_ratio(sharpe_obs, n_trials, obs_per_year=252):
    """
    Adjust observed Sharpe for selection bias from multiple testing.
    Based on Bailey & Lopez de Prado (2013), Journal of Portfolio Management.

    Args:
        sharpe_obs: Observed annualized Sharpe ratio
        n_trials: Total strategy/parameter variations tested (be honest)
        obs_per_year: Trading days per year
    Returns:
        dict with DSR assessment
    """
    import scipy.stats as ss
    import numpy as np

    euler_gamma = 0.5772
    expected_max_sr = (1 - euler_gamma) * ss.norm.ppf(1 - 1/max(n_trials, 2)) + \
                      euler_gamma * ss.norm.ppf(1 - 1/(max(n_trials, 2) * np.e))
    sr_benchmark = expected_max_sr / np.sqrt(obs_per_year)

    return {
        "observed_sharpe": sharpe_obs,
        "expected_max_sr_if_all_noise": round(expected_max_sr, 3),
        "n_trials": n_trials,
        "passes_dsr": sharpe_obs > sr_benchmark,
        "verdict": "PASSES" if sharpe_obs > sr_benchmark else "FAIL: Sharpe may be a false positive given number of trials"
    }

Track n_trials from the start (in the hypothesis file). Include every parameter variation, every signal variant, and every strategy reformulation you tested.

CPCV (Optional but recommended for higher confidence)

Combinatorial Purged Cross-Validation generates a distribution of OOS paths rather than one, making overfitting detection more robust than walk-forward alone. Implement from Lopez de Prado's AFML Chapter 12 or use the mlfinlab library.

Write walk-forward + DSR results into algo-builder/results/\x3Cname>_walkforward.md.

Step 6: Paper Trade

Before live capital, run on live data with simulated fills for a minimum of:

20 completed trades, OR
4 weeks, whichever is longer

Compare paper trade metrics to backtest expectations:

Is the entry fill rate reasonable?
Is slippage within assumed bounds?
Are signals triggering as expected?

If paper trade significantly underperforms backtest: find out why before going live. Common culprits: execution assumptions, data latency, bid-ask spread not modeled, regime shift since backtest period.

Step 7: Live Deployment (small size)

Start at 10-25% of intended position size. Scale up only after:

Live performance within reasonable band of backtest expectations
At least 20 live trades completed
No unexplained drawdowns

Output Files

Maintain this folder structure:

algo-builder/
  hypotheses/        # Step 1 - one file per strategy idea (includes trial count)
  signals/           # Step 2 - IC, ICIR, decay, regime results per signal
  strategies/        # Step 3 - full strategy specs including TC budget
  results/           # Steps 4+5 - backtest + walkforward + DSR results
  scripts/           # reusable Python scripts
    test_signal.py
    backtest.py
    walkforward.py
    fetch_data.py

Working with the User

When the user describes a strategy idea:

Ask the 5 hypothesis questions before touching code
Identify all distinct signals in the strategy
Ask how many signals/strategies have already been tested (for DSR tracking)
Run signal tests FIRST, report IC AND ICIR results
Only proceed to backtest if signals pass the full gate (IC + ICIR + regime)
Always report both in-sample AND walk-forward + DSR results
Never call a strategy "good" based on in-sample only

When the user presents backtest results for review:

Ask if walk-forward was done
Ask how many variations were tested (DSR check)
Check the bias checklist
Ask for number of trades (flag if \x3C 30)
Ask what the transaction cost assumption was and if it was quantified

When the user wants to just "write the strategy code":

Redirect to hypothesis and signal testing first
Explain why: "A backtest without signal validation is just curve-fitting. We test the signals first so we know the edge is real before spending time on the full backtest."

Asset Class Notes

US Equities (daily): IC 0.03–0.05 is genuinely good in efficient large-cap markets. Breadth (number of independent bets) matters as much as IC magnitude. Include delisted stocks to avoid survivorship bias.

Crypto: Less efficient — IC thresholds can be higher (0.05–0.10 achievable). Regime instability is extreme (bull/bear cycles are fast and severe). IC by regime is essential. Filter wash trading from volume signals. Shorter valid history (2017–present for liquid multi-exchange data).

Futures/Macro: Trend-following factors have longer holding periods — IC decay analysis is especially important. Model roll yield and funding costs in TC budget.

HFT: This pipeline does not apply. See Scope section above.

Common Mistakes to Flag

"I backtested and got 300% returns": ask about max drawdown, number of trades, transaction costs, walk-forward, and how many strategies were tested before this one
"I optimized the parameters until it looked good": this is overfitting, explain the bias and check DSR
"I'll add a stop loss later": stops are part of strategy construction, not an afterthought
"IC is 0.12, that's strong": flag as exceptional — verify data quality before celebrating
"I just need to backtest this one idea": ask how many ideas came before it — that's your trial count
Using daily close prices for intraday strategies: look-ahead bias
Testing on the same data used to get the idea: data-snooping bias
Skipping ICIR: a signal with inconsistent IC is more dangerous than a weak but consistent one

Key References

Grinold & Kahn, Active Portfolio Management (1999) — IC framework, Fundamental Law
Lopez de Prado, Advances in Financial Machine Learning (2018) — CPCV, feature importance, production pipeline
Bailey & Lopez de Prado, "The Deflated Sharpe Ratio" (2013) — multiple testing correction
Almgren & Chriss (2000) — market impact modeling

Usage Guidance

This skill appears to do what it claims (a disciplined pipeline for research and backtesting) and does not request credentials or install code by itself. Before installing or using it, consider: 1) Review the full SKILL.md (the provided excerpt is truncated) to see exactly how it fetches market data and whether it references external APIs or endpoints. 2) When the agent generates scripts (fetch_data.py, backtest.py), review them before running — they may perform network calls or heavy computation. 3) Do not provide exchange or broker API keys until you review and trust the generated code; use paper-trading credentials or sandbox data first. 4) If you intend to let the agent run autonomously, restrict or monitor any automatic execution that could place live orders. 5) Run the skill in a sandboxed environment (container or VM) if you are concerned about code-writing or network activity. If you want, provide the full, untruncated SKILL.md so I can re-evaluate any data-fetching or execution details that are currently ambiguous.

Capability Analysis

Type: OpenClaw Skill Name: algo-builder Version: 1.0.0 The 'algo-builder' skill is a comprehensive framework for quantitative trading research, following established financial methodologies (e.g., Grinold & Kahn, Lopez de Prado). The instructions in SKILL.md and README.md guide the agent to enforce a rigorous pipeline (Hypothesis to Live Deployment) and include standard Python code for statistical analysis, such as calculating Information Coefficients and the Deflated Sharpe Ratio. No indicators of data exfiltration, malicious execution, or harmful prompt injection were identified.

Capability Assessment

✓ Purpose & Capability

Name/description (building and testing trading algorithms) align with the SKILL.md: it guides hypothesis writing, signal testing, strategy construction, backtesting, walk‑forward, and paper/live steps. The files it asks to create (hypotheses/, scripts/, results/) are appropriate for this purpose. There are no unexpected binaries, config paths, or unrelated credentials requested.

ℹ Instruction Scope

Instructions tell the agent to generate files and Python scripts (e.g., test_signal.py), run signal/backtest analyses, and write outputs into an algo-builder directory. This is within scope. The SKILL.md is long and the provided excerpt is truncated — parts that handle fetching market data (fetch_data.py) and any external endpoints are referenced but not fully visible; that introduces some ambiguity about exactly how network/data access is performed.

✓ Install Mechanism

No install spec and no code files are bundled beyond README.md and SKILL.md. As an instruction-only skill it does not download or install external artifacts by itself, which is the lowest-risk model for installs.

✓ Credentials

The skill declares no required environment variables, no primary credential, and no config paths. The SKILL.md asks the user to provide data source and asset choices but does not demand unrelated secrets. This is proportionate to a strategy-building tool. Note: practical use will often require market-data API keys (not declared here) — those would be expected but are not requested by the skill itself.

✓ Persistence & Privilege

always is false and model invocation is allowed (defaults). The skill instructs writing files under a local algo-builder folder (its own workspace) but does not request modifying other skills or system-wide configuration. That level of persistence (local files) is normal for this kind of tool.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install algo-builder
After installation, invoke the skill by name or use /algo-builder
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Algo Builder Skill 1.0.0 - Initial release of the Algo Builder skill for systematic trading pipeline construction. - Provides a step-by-step process: Hypothesis → Signal Test → Strategy Construction → Backtest → Walk-Forward → Paper Trade → Live. - Includes detailed guidance and code templates for rigorous signal/feature testing (IC, ICIR, regime checks, decay). - Explicitly optimized for daily-to-weekly (swing/factor/position trading) strategies; not designed for HFT. - Guides users to document and validate trading hypotheses before coding or backtesting. - Supplies best practices, templates, and sample Python code to ensure robust research and avoid common overfitting pitfalls.

Metadata

Slug algo-builder

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Algo Builder?

Build and test systematic daily-to-weekly trading algorithms through hypothesis, signal validation, backtesting, and strategy development steps. It is an AI Agent Skill for Claude Code / OpenClaw, with 314 downloads so far.

How do I install Algo Builder?

Run "/install algo-builder" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Algo Builder free?

Yes, Algo Builder is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Algo Builder support?

Algo Builder is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Algo Builder?

It is built and maintained by tonbi (@tonbistudio); the current version is v1.0.0.

More Skills