Phase 6 — Flywheel Automation

The Compounding Loop

                        CLIENT ENGAGEMENT
                        (GTM fixes, ad optimizations)
                               │
                    ┌──────────▼──────────┐
                    │  AUTORESEARCH LOOP    │
                    │  50-100 experiments   │
                    │  per run, scored      │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │  FLYWHEEL WATCHER    │←─────────────────────────┐
                    │  counts new          │                          │
                    │  high-score records  │                          │
                    └──────────┬──────────┘                          │
                               │ delta ≥ 20                         │
                    ┌──────────▼──────────┐                          │
                    │  PIPELINE EXPORT     │                          │
                    │  JSONL v{N} appended │                          │
                    │  quality gates pass  │                          │
                    └──────────┬──────────┘                          │
                               │                                    │
                    ┌──────────▼──────────┐                          │
                    │  FINE-TUNE RUNNER    │                          │
                    │  Track A or B        │                          │
                    │  eval → promote      │                          │
                    └──────────┬──────────┘                          │
                               │                                    │
                    ┌──────────▼──────────┐                          │
                    │  OPENCLAW BRAIN      │                          │
                    │  new model active    │                          │
                    │  smarter responses   ├──── DRIFT CHECK ─────────┘
                    └─────────────────────┘  (telemetry → regression?)

Watcher Trigger Events

Event	Condition	Action	Configurable
New examples gate	New high-score experiments since last export ≥ threshold	Trigger JSONL pipeline export → bump version	`RETRAIN_DELTA=20`
Scheduled run	Cron: after each autoresearch run completes	Check new example count, trigger if gate met	`CRON_SCHEDULE`
Drift detection	Telemetry shows eval score drop > tolerance vs last version	Alert + optionally rollback to last stable version	`DRIFT_TOLERANCE=0.05`
Account state change	AccountState minor/major version bump detected	Force retrain — old training data used stale account context	`RETRAIN_ON_STATE_CHANGE=true`
Manual trigger	`pnpm flywheel run --client hre --force`	Bypass delta gate, run full pipeline immediately	always available

Watcher Configuration Per Client

Retrain Delta

Minimum new high-score experiments before triggering a retrain. Prevents over-training on small batches.

Drift Tolerance

0.05

Max acceptable eval score regression between versions. Below this → rollback to last stable model automatically.

Max Versions Kept

Older model versions pruned after this count. Keeps registry lean. Fine-tune files deleted from OpenAI too.

Drift Detection — Eval Score Over Versions

HRE model eval score per version. Dashed line = min acceptable (prev - 0.05). Regression in v4 triggers auto-rollback.

0.62

0.71

v3 ●

0.84

v4 ✕

0.72

improving within tolerance regression → rollback

v4 regression: 0.72 < (0.84 - 0.05 = 0.79) → auto-rolled back to v3. Slack alert sent. v4 archived.

Notification Events

Slack Notifications

✓ hre v4 promoted → active (eval: 0.91)

⚠ hre v4 regression (0.72) → rolled back to v3

◎ teleios: 50 examples reached → retrain queued

⟳ Account state v2.0.0 detected → force retrain hre

Retrain Schedule (HRE)

Last autoresearch run: 2026-04-07

New high-score examples: 23 / 20 ✓

Retrain triggered: yes → v4 queued

Current active: v3 (0.84)

Next check: after next research run

Flywheel Watcher Config — data/clients/{id}/flywheel.json

    {
  "client_id": "hre",
  "retrain_delta": 20,            // new examples needed to trigger retrain
  "drift_tolerance": 0.05,         // max acceptable eval regression
  "max_versions": 5,              // prune older versions beyond this count
  "retrain_on_state_change": true,  // force retrain if AccountState major bumped
  "auto_promote": true,           // auto-promote if eval passes, no manual step
  "auto_rollback": true,          // auto-rollback on regression
  "notifications": {
    "slack_channel": "#organized-ai-ops",
    "events": ["promote", "rollback", "gate_met", "state_change"]
  },
  "last_export_run_id": "exp-2026-04-07-100",  // cursor for delta tracking
  "last_retrain_version": "v3"
}
  

Complete Pipeline — All 6 Phases

gtm-autoresearch · feature/finetune-pipeline

Phase 1

Experiment Logger — SQLite ExperimentRecord schema, score normalization

✓ specced

Phase 2

Account State Collector — GTM + Google Ads + Meta MCP → AccountState JSON

✓ specced

Phase 3

JSONL Pipeline — Score filter, Chroma dedup, system prompt injection, quality gates

✓ specced

Phase 4

Fine-Tune Runner — Track A (OpenAI) + Track B (Ollama M3 Ultra), eval harness

✓ specced

Phase 5

OpenClaw Brain — ClientID middleware, ModelRouter, fallback chain, telemetry

✓ specced

Phase 6

Flywheel — Watcher, drift detection, auto-promote/rollback, Slack notifications

← building

CLI Output — pnpm flywheel start --client hre

$ pnpm flywheel start --client hre

──────────────────────────────────────────────────────
  Flywheel Watcher — HRE
──────────────────────────────────────────────────────
  Config loaded...           ✓ flywheel.json (retrain_delta: 20)
  Last export cursor:        exp-2026-04-07-100
  Checking new examples...   ✓ 23 new high-score experiments
  Delta gate (≥20):          ✓ 23 ≥ 20 — triggering pipeline

  → Running JSONL export...
    Score filter:            ✓ 23 records pass (≥0.75)
    Dedup check:             ✓ 2 removed, 21 kept
    Quality gates:           ✓ all pass
    Output:                  ✓ data/clients/hre/v4.jsonl (21 records)

  → Submitting fine-tune (Track A)...
    Upload:                  ✓ file-xyz789
    Job created:             ✓ ftjob-xyz789
    Training...              [ 15 min ]
    Succeeded:               ✓ ft:gpt-4o-mini-2024-07-18:hre-v4

  → Running eval harness...
    v4 eval score:           ✓ 0.91
    vs v3 (0.84):            ✓ +0.07 improvement
    Regression guard:        ✓ pass

  → Promoting...
    v3 → archived            ✓
    v4 → active              ✓
    OpenClaw reloaded:       ✓ hre → ft:gpt-4o-mini:hre-v4
    Slack notified:          ✓ #organized-ai-ops

  Cursor updated:            exp-2026-04-14-023
──────────────────────────────────────────────────────
  ⟳ Flywheel complete — HRE model v4 active (0.91)

Claude Code Prompt

claude --dangerously-skip-permissions

# Phase 6: Flywheel Automation
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch

## Context
All prior phases complete. Phase 6 closes the loop —
a watcher that triggers the full pipeline automatically
after each autoresearch run when enough new high-quality
examples have accumulated.

## Task

Read AGENT-HANDOFF/ and PLANNING/ first. Then:

1. Create packages/flywheel/

   FlywheelWatcher:
   - Load flywheel.json per client
   - Track cursor: last_export_run_id (last SQLite run_id processed)
   - On run: COUNT experiments WHERE run_id > cursor AND score >= threshold
   - If count >= retrain_delta: trigger full pipeline
   - Update cursor to latest run_id after trigger

   Pipeline Orchestrator (calls existing packages in sequence):
   1. training-pipeline export --client {id} --delta
   2. fine-tune-runner submit --client {id} --version {N} --track {pref}
   3. fine-tune-runner eval --client {id} --version {N}
   4. If eval passes: openclaw register + promote
   5. If eval regresses: rollback + alert

   DriftDetector:
   - After each promotion, query telemetry SQLite (from Phase 5)
   - Compare live response quality vs eval score baseline
   - If delta > drift_tolerance over 24h window: trigger alert
   - Optional: schedule periodic re-eval against holdout set

   AccountStateWatcher:
   - Watch AccountState version file for major/minor bumps
   - On major bump: force full retrain (bypass delta gate)
   - On minor bump: log + retrain at next natural cycle

   NotificationService:
   - Slack webhook to #organized-ai-ops
   - Events: promote, rollback, gate_met, regression, state_change
   - Message format: emoji + client_id + event + key metric

   VersionPruner:
   - After each promote, check version count
   - If > max_versions: delete oldest archived version
   - For Track A: call OpenAI DELETE /v1/files/{file_id}
   - For Track B: call ollama rm {client_id}-client:v{old}

2. flywheel.json schema per client (data/clients/{id}/flywheel.json):
   retrain_delta, drift_tolerance, max_versions,
   retrain_on_state_change, auto_promote, auto_rollback,
   notifications{slack_channel, events[]},
   last_export_run_id (cursor), last_retrain_version

3. CLI:
   pnpm flywheel start --client hre          # run once now
   pnpm flywheel watch --client hre          # watch mode (post-research hook)
   pnpm flywheel status --client hre         # show current state
   pnpm flywheel run --client hre --force    # bypass delta gate

4. Hook into autoresearch run completion:
   Add post-run hook that calls:
   pnpm flywheel watch --client {client_id}

5. Unit tests:
   - Delta gate: triggers at exactly retrain_delta, not before
   - Rollback: activates when eval drops below tolerance
   - Cursor: advances only after successful pipeline run
   - Pruner: removes correct version when count > max_versions
   - NotificationService: formats Slack message correctly

## Env vars:
CLIENT_DATA_DIR=./data/clients
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
RETRAIN_DELTA=20
DRIFT_TOLERANCE=0.05
MAX_MODEL_VERSIONS=5

## This is the final phase. After completion:
## - Run full integration test: all 6 phases end-to-end with HRE
## - Update README with complete pipeline diagram
## - Tag: v1.0.0-finetune-pipeline

← Phase 5: OpenClaw gtm-autoresearch-docs.pages.dev All Docs →

TheFlywheel

Retrain Delta

Drift Tolerance

Max Versions Kept

Slack Notifications

Retrain Schedule (HRE)

gtm-autoresearch · feature/finetune-pipeline

The
Flywheel