Client LLM Fine-Tune Pipeline

╔══════════════════════════════════════════════════════════════════════════╗
║                     CLIENT ACCOUNT DATA SOURCES                         ║
╠══════════════╦═══════════════════╦══════════════╦════════════════════════╣
║  GTM Export   ║  Google Ads MCP   ║  Meta MCP    ║  claude-mem             ║
║  Container   ║  Campaigns/Conv   ║  Pixel/CAPI  ║  SQLite + Chroma       ║
╚══════════════╩═════════╤═════════╩══════════════╩════════════════════════╝
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    ACCOUNT STATE COLLECTOR                              │
│   Normalizes all client account data into structured context object        │
│   client_id / gtm_schema / ad_structure / pixel_events / history          │
└──────────────────────────────────┬──────────────────────────────────────┘
                                   │
                    ┌──────────────┴──────────────┐
                    ▼                             ▼
┌───────────────────────────┐   ┌──────────────────────────────────────────┐
│  AUTORESEARCH LOOP         │   │  EXISTING claude-mem HISTORY              │
│  50-100 experiments/run    │   │  Past fixes, decisions, patterns          │
│  Each: problem→solution    │   │  Chroma vector store                      │
│  Score: 0.0 → 1.0         │   └──────────────────────────────────────────┘
└──────────────┬────────────┘                  │
               │                                │
               ▼                                ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                    SCORING FILTER + FORMATTER                            │
│  score ≥ 0.75 → keep │ Dedup │ Inject account_state into system prompt    │
│  Output: training.jsonl  {"system":..., "user":..., "assistant":...}      │
└──────────────────────────────┬───────────────────────────────────────────┘
                               │
              ┌────────────────┴────────────────┐
              ▼                                 ▼
┌─────────────────────────┐   ┌─────────────────────────────────────────────┐
│ OPENAI FINE-TUNE API     │   │ OLLAMA LOCAL (NoClaw :11434)                │
│ gpt-4o-mini target       │   │ Llama 3.1 8B / Mistral → M3 Ultra          │
│ Fast MVP / cloud         │   │ Private / no data egress                   │
└─────────────┬───────────┘   └──────────────────┬──────────────────────────┘
              └────────────────┬────────────────┘
                               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                   CLIENT BRAIN AGENT (OpenClaw)                          │
│  Fine-tuned model routed per client_id inside OpenClaw gateway           │
│  No 200k context dump needed — patterns baked into weights               │
│  New experiments → re-score → auto retrain trigger → model v++           │
└──────────────────────────────────────────────────────────────────────────┘
                               ▲
                               │  flywheel: more client work → better model
                               └──────────────────────────────────────────

// Phased Implementation

PHASE 01

Experiment Logger Instrumentation

Modify autoresearch loop to persist structured experiment records with score, context, and client_id. This is the foundation — everything else depends on clean experiment output.

Add ExperimentRecord schema: client_id, run_id, problem, solution, score, timestamp, account_snapshot

Write to SQLite experiments table (extend claude-mem or new DB)

Score normalization: ensure 0.0–1.0 float, not pass/fail binary

Tag each record with data sources used (GTM / Google Ads / Meta)

sqlite autoresearch

PHASE 02

Account State Collector

Pull and normalize client account data from GTM, Google Ads MCP, and Pipeboard Meta MCP into a structured AccountState object stored per client. This becomes the system prompt context.

GTM: export container JSON → parse tags/triggers/variables into schema

Google Ads MCP: pull campaigns, conversion actions, account structure

Pipeboard Meta MCP: pull pixel events, CAPI config, ad account structure

Serialize to clients/{client_id}/account_state.json, versioned

Snapshot on each autoresearch run (state at time of experiment)

GTM MCP Google Ads Meta MCP

PHASE 03

Training Data Pipeline

Filter high-scoring experiments, inject account state into system prompts, and export clean JSONL training files per client. This is the core transformation step.

Score filter: score ≥ 0.75 (configurable threshold per client)

Deduplication: cosine similarity on solution text via Chroma

System prompt builder: inject client AccountState + role context

Export: training/{client_id}/v{N}.jsonl in OpenAI fine-tune format

Minimum 50 examples gate before fine-tune is triggered

chroma jsonl

PHASE 04

Fine-Tune Runner

Two-track runner: OpenAI API for fast cloud fine-tunes, Ollama/MLX for private local fine-tunes on M3 Ultra. Triggered automatically when new JSONL version is ready.

Track A — OpenAI: upload JSONL → create fine-tune job → poll status → register model ID

Track B — Ollama: convert JSONL → Modelfile → ollama create client-{id}-v{N} on M3 Ultra

Model registry: models/{client_id}.json tracking versions, eval scores, active flag

Eval harness: 10-question holdout set per client, auto-score new model before promoting

OpenAI API Ollama M3 Ultra

PHASE 05

OpenClaw Client Brain Integration

Route requests through OpenClaw gateway to the correct client-specialized model. Each client_id maps to an active model version. No context dump required at inference.

OpenClaw middleware: read x-client-id header → lookup active model in registry

Route to local Ollama client model OR OpenAI fine-tuned model per flag

Fallback: if no client model ready, use full AccountState as RAG context

Telemetry: log which model version handled each request for drift tracking

OpenClaw :18789

PHASE 06

Flywheel Automation

Close the loop. New autoresearch runs automatically trigger re-scoring, JSONL delta append, and retrain when enough new high-quality examples accumulate. Model gets smarter with every client engagement.

Watcher: after each autoresearch run, check if new high-score count ≥ threshold (default 20)

Auto-append to JSONL, bump version, trigger fine-tune runner

Drift detection: compare eval scores v(N) vs v(N-1), rollback if regression

Slack/notification: post model promotion events to client channel

cron flywheel

// Account State Schema Inputs

GTM Container

Tag names + trigger conditions
Variable definitions
dataLayer event schema
Container workspace state
Published vs draft diff

Google Ads MCP

Campaign structure + types
Conversion action IDs + names
Enhanced Conversions config
PMAX asset group IDs
Historical performance flags

Pipeboard Meta MCP

Pixel ID + event list
CAPI configuration
Custom audience IDs
Ad account structure
Stape container state

// Training Record Format

    {
  "messages": [
    {
      "role": "system",
      "content": "You are a tracking expert for HRE.
GTM: GTM-XXXXXXX | sGTM: container-XXXXXXX
Conversion actions: All Purchases (tag:AW-xxx/yyy), Lead Form Submit
Known issue history: PMAX $0 value → fix: sGTM cart items mapping
dataLayer schema: {event, ecommerce.value, ecommerce.items[]}
Meta Pixel: HRE-pixel-id | CAPI: configured via Stape"
    },
    {
      "role": "user",
      "content": "PMAX campaign shows conversions but $0 revenue value"
    },
    {
      "role": "assistant",
      "content": "Root cause: sGTM purchase tag is reading value from 
top-level ecommerce object but HRE pushes revenue 
inside items[0].price * quantity. Fix: update sGTM variable 
to map items array and sum value. Tag: 'sGTM - GA4 Purchase'
→ Variable: {{DLV - ecommerce.items}} → recompute value."
    }
  ],
  "metadata": {
    "client_id": "hre",
    "score": 0.91,
    "run_id": "exp-2026-04-07-047",
    "sources": ["gtm", "google_ads"]
  }
}
  

⟳ The Compounding Moat

Every client engagement generates experiments. Every experiment above threshold extends the training set. Every retrain makes the model more accurate for that client's specific patterns. At ~500 high-quality examples per client, the model knows their full tracking architecture better than any human consultant who hasn't lived in their accounts for years. This is the retention mechanic — the model is an asset that only you and the client share.

// Claude Code Prompt

claude --dangerously-skip-permissions

# Client LLM Fine-Tune Pipeline — Phase 1 Bootstrap

## Context
Repo: github.com/Organized-AI/gtm-autoresearch
Infra: M3 Ultra (Ollama :11434 NoClaw), M4 Mini (OpenClaw :18789), claude-mem (SQLite + Chroma :37777)
Clients: bioptimizers, rtt, teleios

## Phase 1 Task: Experiment Logger Instrumentation

Read AGENT-HANDOFF/ and PLANNING/ first. Then:

1. Create packages/experiment-logger/
   - ExperimentRecord TypeScript schema:
     client_id, run_id, problem, solution, score (float 0-1),
     timestamp, account_snapshot (JSON), sources_used string[]
   - SQLite writer extending claude-mem DB at :37777
   - Score normalizer: validate 0.0–1.0 range
   - Client_id tagger on each record

2. Instrument autoresearch loop to call ExperimentLogger.save()
   on each scored experiment output

3. Add CLI: `pnpm experiment-logger export --client bioptimizers`
   outputs all records to stdout as JSONL

4. Write tests for schema validation and SQLite round-trip

Use Organized Codebase structure. Update CLAUDE.md with
experiment-logger package docs. Do not build Phase 2 yet.

## Env vars needed:
CLAUDE_MEM_DB_PATH=/path/to/claude-mem.sqlite
EXPERIMENT_LOG_DIR=./data/experiments
SCORE_THRESHOLD=0.75

Client IntelligenceTraining Loop