sf-ml-baseline v0.1

What it is: gradient-boosted tree ensembles that predict prediction-market outcomes from engineered microstructure features.

Published: 2026-04-19 (initial, time-capsule β€” see "Retrain plan" below). Trained on: 11 days of SimpleFunctions (market_indicator_history + marketwide_resolutions) β€” 2026-04-08 β†’ 2026-04-19. License: CC-BY-4.0 with SimpleFunctions attribution β€” see LICENSE. Author: SimpleFunctions β€” https://simplefunctions.dev Model repo: https://huggingface.co/SimpleFunctions/sf-ml-baseline (pending upload)

Why release this

Nobody has published a calibrated feature-based baseline for prediction-market forecasting. All prior art (Halawi 2024, Schoenegger 2024, AIA 2025) uses LLM + news retrieval. We release this as the feature-based reference that LLM systems should ensemble with.

Brier scores (vs market-implied baseline, 95% CI):

Task Model Brier CI Ξ” vs baseline
V1 Γ— T1: direction 24h LGBM 3-seed 0.2295 [0.2290, 0.2299] βˆ’0.0205 (vs coinflip 0.2500)
V1 Γ— T1: direction 24h XGBoost 3-seed 0.2296 [0.2292, 0.2301] βˆ’0.0204
V1 Γ— T1: direction 24h CatBoost 3-seed 0.2295 [0.2290, 0.2299] βˆ’0.0205
V1 Γ— T1: direction 24h Ensemble (3-model Γ— 3-seed = 9) 0.2294 [0.2289, 0.2299] βˆ’0.0206
V2 Γ— T4: resolution 24h XGBoost 3-seed 0.1681 [0.1605, 0.1759] βˆ’0.0086 (vs price/100 = 0.1767)

Statistically significant (non-overlapping 95% CI) on V1 Γ— T1 at 246,862 test samples.

Install

pip install lightgbm xgboost catboost numpy pandas

Use

from pathlib import Path
from sf_ml_baseline import SFBaseline

model = SFBaseline(weights_dir='sf-ml-baseline/weights')

# Direction forecast: probability that the 24h-forward price will be HIGHER than now.
# Features: current price (cents, 0-100), 24h price delta (cents),
#           implicit yield (%), calibration ratio index (unitless), calibration variability ratio.
p_up = model.predict_direction(price_cents=55, delta_cents=3, iy=12.5, cri=0.6, cvr=0.8)
print(f'P(price rises in next 24h) = {p_up:.3f}')

# Or batch-predict from a DataFrame:
import pandas as pd
df = pd.DataFrame([
    {'price_cents': 55, 'delta_cents': 3, 'iy': 12.5, 'cri': 0.6, 'cvr': 0.8},
    {'price_cents': 82, 'delta_cents': -1, 'iy': 4.5, 'cri': 0.3, 'cvr': 0.9},
])
probas = model.predict_direction_batch(df)

See predict.py for the full inference code.

Architecture

  • Features (V1): price_cents, delta_cents (24h price change), iy (implicit yield), cri (calibration ratio index), cvr (calibration variability ratio). Spec: SimpleFunctions indicator documentation.
  • Models: 3 LightGBM + 3 XGBoost + 3 CatBoost, each trained with different seeds. Ensemble predictions by simple mean.
  • Split: temporal β€” 80% train / 24h embargo / 20% test. 90/10 inner train/val split for early stopping.
  • Target T1: binary sign(price(t+24h) - price(t)), excludes no-move rows (delta==0 at t+24h).
  • Target T4: resolved_outcome ∈ {0, 1} for markets that resolved in 22-26h after the feature capture time.

Known limitations

  1. Only 11 days of training data. The full feature history table (market_indicator_history) was introduced to SF's data pipeline on 2026-04-08. A proper 30d+ / 180d+ re-train is scheduled; see "Retrain plan" below.
  2. 5 base features only. market_indicator_history holds a compact subset of the full indicator stack. The live market_indicators table has ~20 features (iyYes, iyNo, ee, las, vr, iar, rv, adjIy, cvrDelta, overround, etc.) but only for the current snapshot. Future versions will store history for the full feature set.
  3. cvr has 0% feature importance in the direction model. Investigate whether the window/computation needs tuning.
  4. V2 Γ— T4 (rolling features + resolution label) is below the 0.01 Brier gate globally β€” works well on Crypto (Ξ”=βˆ’0.041) and Commodities (Ξ”=βˆ’0.036) but not Sports (Ξ”=βˆ’0.006) or Financials (+0.004, model worse).
  5. Do not use for live trading without backtesting against your own execution model. This is a calibration baseline, not a PnL strategy.

Retrain plan

v0.1 is a time-capsule. Planned retrains:

Version Trigger When What changes
v0.2 R2 dump archive β‰₯ 30d of indicator history ~2026-05-20 Same architecture, more data; per-category specialist models for Crypto/Commodities/Sports
v0.3 Full indicator feature set stored in history (schema change) TBD V1 grows from 5 β†’ 20 features; re-run Phase A.3 full grid
v1.0 β‰₯ 6 months of R2 data ~2026-10 FT-Transformer + TabPFN + ensemble; formal paper submission (ICLR 2027 FinAI Workshop)

Reproduce

git clone https://github.com/spfunctions/simplefunctions-landing  # (private β€” OSS mirror pending)
cd simplefunctions-landing
source scripts/ml/.venv/bin/activate

# Data pull (uses DIRECT_DATABASE_URL in .env.local)
python scripts/ml/phase-a/01-pull-training-data.py

# Train
python scripts/ml/phase-a/02-train-lgbm.py
python scripts/ml/phase-a/04-bakeoff.py

# Evaluate
python scripts/ml/phase-a/03-evaluate.py

All hyperparameters are documented inline. 3-seed ensembling uses {42, 137, 2026}.

Citation

@software{sf_ml_baseline_v0_1,
  author       = {SimpleFunctions},
  title        = {{sf-ml-baseline}: A feature-based prediction-market forecaster},
  year         = {2026},
  version      = {0.1},
  publisher    = {SimpleFunctions},
  url          = {https://simplefunctions.dev/opensource/sf-ml-baseline},
  license      = {CC-BY-4.0 with SimpleFunctions attribution}
}

See also

  • docs/ml/phase-a-investigation.md β€” 10 open-question investigation (SPEC-19 Β§13)
  • docs/ml/phase-a-results.md β€” gate decision + per-category breakdown
  • .claude/specs/SPEC-19-model-deep-investigation.md β€” full 6-phase research plan
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results