backtest-expert▌
tradermonty/claude-trading-skills · updated May 21, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Backtest Expert
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Core Philosophy
Goal: Find strategies that "break the least", not strategies that "profit the most" on paper.
Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
When to Use This Skill
Use this skill when:
- Developing or validating systematic trading strategies
- Evaluating whether a trading idea is robust enough for live implementation
- Troubleshooting why a backtest might be misleading
- Learning proper backtesting methodology
- Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
- Assessing parameter sensitivity and regime dependence
- Setting realistic expectations for slippage and execution costs
Prerequisites
- Python 3.9+ (for evaluation script)
- No API keys required
- No external data dependencies — metrics are user-provided
Workflow
1. State the Hypothesis
Define the edge in one sentence.
Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
2. Codify Rules with Zero Discretion
Define with complete specificity:
- Entry: Exact conditions, timing, price type
- Exit: Stop loss, profit target, time-based exit
- Position sizing: Fixed $$, % of portfolio, volatility-adjusted
- Filters: Market cap, volume, sector, volatility conditions
- Universe: What instruments are eligible
Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
3. Run Initial Backtest
Test over:
- Minimum 5 years (preferably 10+)
- Multiple market regimes (bull, bear, high/low volatility)
- Realistic costs: Commissions + conservative slippage
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
4. Stress Test the Strategy
This is where 80% of testing time should be spent.
Parameter sensitivity:
- Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
- Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
- Vary entry/exit timing by ±15-30 minutes
- Look for "plateaus" of stable performance, not narrow spikes
Execution friction:
- Increase slippage to 1.5-2x typical estimates
- Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
- Add realistic order rejection scenarios
- Test with pessimistic commission structures
Time robustness:
- Analyze year-by-year performance
- Require positive expectancy in majority of years
- Ensure strategy doesn't rely on 1-2 exceptional periods
- Test in different market regimes separately
Sample size:
- Absolute minimum: 30 trades
- Preferred: 100+ trades
- High confidence: 200+ trades
5. Out-of-Sample Validation
Walk-forward analysis:
- Optimize on training period (e.g., Year 1-3)
- Test on validation period (Year 4)
- Roll forward and repeat
- Compare in-sample vs out-of-sample performance
Warning signs:
- Out-of-sample <50% of in-sample performance
- Need frequent parameter re-optimization
- Parameters change dramatically between periods
6. Evaluate Results
Questions to answer:
- Does edge survive pessimistic assumptions?
- Is performance stable across parameter variations?
- Does strategy work in multiple market regimes?
- Is sample size sufficient for statistical confidence?
- Are results realistic, not "too good to be true"?
Decision criteria:
- ✅ Deploy: Survives all stress tests with acceptable performance
- 🔄 Refine: Core logic sound but needs parameter adjustment
- ❌ Abandon: Fails stress tests or relies on fragile assumptions
Use the evaluation script for a structured, quantitative assessment:
python3 skills/backtest-expert/scripts/evaluate_backtest.py \
--total-trades 150 \
--win-rate 62 \
--avg-win-pct 1.8 \
--avg-loss-pct 1.2 \
--max-drawdown-pct 15 \
--years-tested 8 \
--num-parameters 3 \
--slippage-tested \
--output-dir reports/
The script scores across 5 dimensions (Sample Size, Expectancy, Risk Management, Robustness, Execution Realism), detects red flags, and outputs a Deploy/Refine/Abandon verdict.
Key Testing Principles
Punish the Strategy
Add friction everywhere:
- Commissions higher than reality
- Slippage 1.5-2x typical
- Worst-case fills
- Order rejections
- Partial fills
Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.
Seek Plateaus, Not Peaks
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad: Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
Test All Cases, Not Cherry-Picked Examples
Wrong approach: Study hand-picked "market leaders" that worked Right approach: Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
Separate Idea Generation from Validation
Intuition: Useful for generating hypotheses Validation: Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
Common Failure Patterns
Recognize these patterns early to save time:
- Parameter sensitivity: Only works with exact parameter values
- Regime-specific: Great in some years, terrible in others
- Slippage sensitivity: Unprofitable when realistic costs added
- Small sample: Too few trades for statistical confidence
- Look-ahead bias: "Too good to be true" results
- Over-optimization: Many parameters, poor out-of-sample results
See references/failed_tests.md for detailed examples and diagnostic framework.
Output
reports/backtest_eval_<timestamp>.json— structured evaluation with per-dimension scores, red flags, and verdictreports/backtest_eval_<timestamp>.md— human-readable report with dimension table, key metrics, and red flag details
Resources
Methodology Reference
File: references/methodology.md
When to read: For detailed guidance on specific testing techniques.
Contents:
- Stress testing methods
- Parameter sensitivity analysis
- Slippage and friction modeling
- Sample size requirements
- Market regime classification
- Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
Failed Tests Reference
File: references/failed_tests.md
When to read: When strategy fails tests, or learning from past mistakes.
Contents:
- Why failures are valuable
- Common failure patterns with examples
- Case study documentation framework
- Red flags checklist for evaluating backtests
Critical Reminders
Time allocation: Spend 20% generating ideas, 80% trying to break them.
Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
Discretionary vs Systematic Differences
This skill focuses on systematic/quantitative backtesting where:
- All rules are codified in advance
- No discretion or "feel" in execution
- Testing happens on all historical examples, not cherry-picked cases
- Context (news, macro) is deliberately stripped out
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
How to use backtest-expert on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add backtest-expert
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches backtest-expert from GitHub repository tradermonty/claude-trading-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate backtest-expert. Access the skill through slash commands (e.g., /backtest-expert) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★36 reviews- ★★★★★Dhruvi Jain· Dec 12, 2024
backtest-expert fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Diya Rahman· Dec 4, 2024
We added backtest-expert from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Meera Gupta· Nov 23, 2024
Keeps context tight: backtest-expert is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Oshnikdeep· Nov 3, 2024
backtest-expert is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Ganesh Mohane· Oct 22, 2024
Keeps context tight: backtest-expert is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Maya Park· Oct 14, 2024
backtest-expert is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Rahul Santra· Sep 13, 2024
Registry listing for backtest-expert matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Isabella Sharma· Sep 9, 2024
I recommend backtest-expert for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Maya Agarwal· Sep 5, 2024
backtest-expert reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Mia Khan· Aug 28, 2024
Useful defaults in backtest-expert — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 36