The Compounding Loop
CLIENT ENGAGEMENT
(GTM fixes, ad optimizations)
│
┌──────────▼──────────┐
│ AUTORESEARCH LOOP │
│ 50-100 experiments │
│ per run, scored │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ FLYWHEEL WATCHER │←─────────────────────────┐
│ counts new │ │
│ high-score records │ │
└──────────┬──────────┘ │
│ delta ≥ 20 │
┌──────────▼──────────┐ │
│ PIPELINE EXPORT │ │
│ JSONL v{N} appended │ │
│ quality gates pass │ │
└──────────┬──────────┘ │
│ │
┌──────────▼──────────┐ │
│ FINE-TUNE RUNNER │ │
│ Track A or B │ │
│ eval → promote │ │
└──────────┬──────────┘ │
│ │
┌──────────▼──────────┐ │
│ OPENCLAW BRAIN │ │
│ new model active │ │
│ smarter responses ├──── DRIFT CHECK ─────────┘
└─────────────────────┘ (telemetry → regression?)
Watcher Trigger Events
| Event | Condition | Action | Configurable |
| New examples gate |
New high-score experiments since last export ≥ threshold |
Trigger JSONL pipeline export → bump version |
RETRAIN_DELTA=20 |
| Scheduled run |
Cron: after each autoresearch run completes |
Check new example count, trigger if gate met |
CRON_SCHEDULE |
| Drift detection |
Telemetry shows eval score drop > tolerance vs last version |
Alert + optionally rollback to last stable version |
DRIFT_TOLERANCE=0.05 |
| Account state change |
AccountState minor/major version bump detected |
Force retrain — old training data used stale account context |
RETRAIN_ON_STATE_CHANGE=true |
| Manual trigger |
pnpm flywheel run --client hre --force |
Bypass delta gate, run full pipeline immediately |
always available |
Watcher Configuration Per Client
Retrain Delta
20
Minimum new high-score experiments before triggering a retrain. Prevents over-training on small batches.
Drift Tolerance
0.05
Max acceptable eval score regression between versions. Below this → rollback to last stable model automatically.
Max Versions Kept
5
Older model versions pruned after this count. Keeps registry lean. Fine-tune files deleted from OpenAI too.
Drift Detection — Eval Score Over Versions
HRE model eval score per version. Dashed line = min acceptable (prev - 0.05). Regression in v4 triggers auto-rollback.
improving
within tolerance
regression → rollback
v4 regression: 0.72 < (0.84 - 0.05 = 0.79) → auto-rolled back to v3. Slack alert sent. v4 archived.
Notification Events
Slack Notifications
✓ hre v4 promoted → active (eval: 0.91)
⚠ hre v4 regression (0.72) → rolled back to v3
◎ teleios: 50 examples reached → retrain queued
⟳ Account state v2.0.0 detected → force retrain hre
Retrain Schedule (HRE)
Last autoresearch run: 2026-04-07
New high-score examples: 23 / 20 ✓
Retrain triggered: yes → v4 queued
Current active: v3 (0.84)
Next check: after next research run
Flywheel Watcher Config — data/clients/{id}/flywheel.json
{
"client_id": "hre",
"retrain_delta": 20, // new examples needed to trigger retrain
"drift_tolerance": 0.05, // max acceptable eval regression
"max_versions": 5, // prune older versions beyond this count
"retrain_on_state_change": true, // force retrain if AccountState major bumped
"auto_promote": true, // auto-promote if eval passes, no manual step
"auto_rollback": true, // auto-rollback on regression
"notifications": {
"slack_channel": "#organized-ai-ops",
"events": ["promote", "rollback", "gate_met", "state_change"]
},
"last_export_run_id": "exp-2026-04-07-100", // cursor for delta tracking
"last_retrain_version": "v3"
}
Complete Pipeline — All 6 Phases
gtm-autoresearch · feature/finetune-pipeline
Phase 1
Experiment Logger — SQLite ExperimentRecord schema, score normalization
✓ specced
Phase 2
Account State Collector — GTM + Google Ads + Meta MCP → AccountState JSON
✓ specced
Phase 3
JSONL Pipeline — Score filter, Chroma dedup, system prompt injection, quality gates
✓ specced
Phase 4
Fine-Tune Runner — Track A (OpenAI) + Track B (Ollama M3 Ultra), eval harness
✓ specced
Phase 5
OpenClaw Brain — ClientID middleware, ModelRouter, fallback chain, telemetry
✓ specced
Phase 6
Flywheel — Watcher, drift detection, auto-promote/rollback, Slack notifications
← building
CLI Output — pnpm flywheel start --client hre
$ pnpm flywheel start --client hre
──────────────────────────────────────────────────────
Flywheel Watcher — HRE
──────────────────────────────────────────────────────
Config loaded... ✓ flywheel.json (retrain_delta: 20)
Last export cursor: exp-2026-04-07-100
Checking new examples... ✓ 23 new high-score experiments
Delta gate (≥20): ✓ 23 ≥ 20 — triggering pipeline
→ Running JSONL export...
Score filter: ✓ 23 records pass (≥0.75)
Dedup check: ✓ 2 removed, 21 kept
Quality gates: ✓ all pass
Output: ✓ data/clients/hre/v4.jsonl (21 records)
→ Submitting fine-tune (Track A)...
Upload: ✓ file-xyz789
Job created: ✓ ftjob-xyz789
Training... [ 15 min ]
Succeeded: ✓ ft:gpt-4o-mini-2024-07-18:hre-v4
→ Running eval harness...
v4 eval score: ✓ 0.91
vs v3 (0.84): ✓ +0.07 improvement
Regression guard: ✓ pass
→ Promoting...
v3 → archived ✓
v4 → active ✓
OpenClaw reloaded: ✓ hre → ft:gpt-4o-mini:hre-v4
Slack notified: ✓ #organized-ai-ops
Cursor updated: exp-2026-04-14-023
──────────────────────────────────────────────────────
⟳ Flywheel complete — HRE model v4 active (0.91)
Claude Code Prompt
claude --dangerously-skip-permissions
# Phase 6: Flywheel Automation
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch
## Context
All prior phases complete. Phase 6 closes the loop —
a watcher that triggers the full pipeline automatically
after each autoresearch run when enough new high-quality
examples have accumulated.
## Task
Read AGENT-HANDOFF/ and PLANNING/ first. Then:
1. Create packages/flywheel/
FlywheelWatcher:
- Load flywheel.json per client
- Track cursor: last_export_run_id (last SQLite run_id processed)
- On run: COUNT experiments WHERE run_id > cursor AND score >= threshold
- If count >= retrain_delta: trigger full pipeline
- Update cursor to latest run_id after trigger
Pipeline Orchestrator (calls existing packages in sequence):
1. training-pipeline export --client {id} --delta
2. fine-tune-runner submit --client {id} --version {N} --track {pref}
3. fine-tune-runner eval --client {id} --version {N}
4. If eval passes: openclaw register + promote
5. If eval regresses: rollback + alert
DriftDetector:
- After each promotion, query telemetry SQLite (from Phase 5)
- Compare live response quality vs eval score baseline
- If delta > drift_tolerance over 24h window: trigger alert
- Optional: schedule periodic re-eval against holdout set
AccountStateWatcher:
- Watch AccountState version file for major/minor bumps
- On major bump: force full retrain (bypass delta gate)
- On minor bump: log + retrain at next natural cycle
NotificationService:
- Slack webhook to #organized-ai-ops
- Events: promote, rollback, gate_met, regression, state_change
- Message format: emoji + client_id + event + key metric
VersionPruner:
- After each promote, check version count
- If > max_versions: delete oldest archived version
- For Track A: call OpenAI DELETE /v1/files/{file_id}
- For Track B: call ollama rm {client_id}-client:v{old}
2. flywheel.json schema per client (data/clients/{id}/flywheel.json):
retrain_delta, drift_tolerance, max_versions,
retrain_on_state_change, auto_promote, auto_rollback,
notifications{slack_channel, events[]},
last_export_run_id (cursor), last_retrain_version
3. CLI:
pnpm flywheel start --client hre # run once now
pnpm flywheel watch --client hre # watch mode (post-research hook)
pnpm flywheel status --client hre # show current state
pnpm flywheel run --client hre --force # bypass delta gate
4. Hook into autoresearch run completion:
Add post-run hook that calls:
pnpm flywheel watch --client {client_id}
5. Unit tests:
- Delta gate: triggers at exactly retrain_delta, not before
- Rollback: activates when eval drops below tolerance
- Cursor: advances only after successful pipeline run
- Pruner: removes correct version when count > max_versions
- NotificationService: formats Slack message correctly
## Env vars:
CLIENT_DATA_DIR=./data/clients
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
RETRAIN_DELTA=20
DRIFT_TOLERANCE=0.05
MAX_MODEL_VERSIONS=5
## This is the final phase. After completion:
## - Run full integration test: all 6 phases end-to-end with HRE
## - Update README with complete pipeline diagram
## - Tag: v1.0.0-finetune-pipeline