Organized AI // AutoResearch → Fine-Tune Pipeline

Client Intelligence
Training Loop

Architecture for converting autoresearch experiment outputs into client-specialized fine-tuned LLMs via GTM, Meta, and Google Ads account state ingestion.

╔══════════════════════════════════════════════════════════════════════════╗
                     CLIENT ACCOUNT DATA SOURCES                         
╠══════════════╦═══════════════════╦══════════════╦════════════════════════╣
  GTM Export     Google Ads MCP     Meta MCP      claude-mem             
  Container     Campaigns/Conv     Pixel/CAPI    SQLite + Chroma       
╚══════════════╩═════════╤═════════╩══════════════╩════════════════════════╝
                          
                          
┌─────────────────────────────────────────────────────────────────────────┐
                    ACCOUNT STATE COLLECTOR                              
   Normalizes all client account data into structured context object        
   client_id / gtm_schema / ad_structure / pixel_events / history          
└──────────────────────────────────┬──────────────────────────────────────┘
                                   
                    ┌──────────────┴──────────────┐
                                                 
┌───────────────────────────┐   ┌──────────────────────────────────────────┐
  AUTORESEARCH LOOP              EXISTING claude-mem HISTORY              
  50-100 experiments/run         Past fixes, decisions, patterns          
  Each: problem→solution         Chroma vector store                      
  Score: 0.0 → 1.0            └──────────────────────────────────────────┘
└──────────────┬────────────┘                  
                                               
                                               
┌──────────────────────────────────────────────────────────────────────────┐
                    SCORING FILTER + FORMATTER                            
  score ≥ 0.75 → keep │ Dedup │ Inject account_state into system prompt    
  Output: training.jsonl  {"system":..., "user":..., "assistant":...}      
└──────────────────────────────┬───────────────────────────────────────────┘
                               
              ┌────────────────┴────────────────┐
                                               
┌─────────────────────────┐   ┌─────────────────────────────────────────────┐
 OPENAI FINE-TUNE API         OLLAMA LOCAL (NoClaw :11434)                
 gpt-4o-mini target           Llama 3.1 8B / Mistral → M3 Ultra          
 Fast MVP / cloud             Private / no data egress                   
└─────────────┬───────────┘   └──────────────────┬──────────────────────────┘
              └────────────────┬────────────────┘
                               
┌──────────────────────────────────────────────────────────────────────────┐
                   CLIENT BRAIN AGENT (OpenClaw)                          
  Fine-tuned model routed per client_id inside OpenClaw gateway           
  No 200k context dump needed — patterns baked into weights               
  New experiments → re-score → auto retrain trigger → model v++           
└──────────────────────────────────────────────────────────────────────────┘
                               
                               │  flywheel: more client work → better model
                               └──────────────────────────────────────────
    
// Phased Implementation
PHASE 01

Experiment Logger Instrumentation

Modify autoresearch loop to persist structured experiment records with score, context, and client_id. This is the foundation — everything else depends on clean experiment output.

Add ExperimentRecord schema: client_id, run_id, problem, solution, score, timestamp, account_snapshot
Write to SQLite experiments table (extend claude-mem or new DB)
Score normalization: ensure 0.0–1.0 float, not pass/fail binary
Tag each record with data sources used (GTM / Google Ads / Meta)
sqlite autoresearch
PHASE 02

Account State Collector

Pull and normalize client account data from GTM, Google Ads MCP, and Pipeboard Meta MCP into a structured AccountState object stored per client. This becomes the system prompt context.

GTM: export container JSON → parse tags/triggers/variables into schema
Google Ads MCP: pull campaigns, conversion actions, account structure
Pipeboard Meta MCP: pull pixel events, CAPI config, ad account structure
Serialize to clients/{client_id}/account_state.json, versioned
Snapshot on each autoresearch run (state at time of experiment)
GTM MCP Google Ads Meta MCP
PHASE 03

Training Data Pipeline

Filter high-scoring experiments, inject account state into system prompts, and export clean JSONL training files per client. This is the core transformation step.

Score filter: score ≥ 0.75 (configurable threshold per client)
Deduplication: cosine similarity on solution text via Chroma
System prompt builder: inject client AccountState + role context
Export: training/{client_id}/v{N}.jsonl in OpenAI fine-tune format
Minimum 50 examples gate before fine-tune is triggered
chroma jsonl
PHASE 04

Fine-Tune Runner

Two-track runner: OpenAI API for fast cloud fine-tunes, Ollama/MLX for private local fine-tunes on M3 Ultra. Triggered automatically when new JSONL version is ready.

Track A — OpenAI: upload JSONL → create fine-tune job → poll status → register model ID
Track B — Ollama: convert JSONL → Modelfile → ollama create client-{id}-v{N} on M3 Ultra
Model registry: models/{client_id}.json tracking versions, eval scores, active flag
Eval harness: 10-question holdout set per client, auto-score new model before promoting
OpenAI API Ollama M3 Ultra
PHASE 05

OpenClaw Client Brain Integration

Route requests through OpenClaw gateway to the correct client-specialized model. Each client_id maps to an active model version. No context dump required at inference.

OpenClaw middleware: read x-client-id header → lookup active model in registry
Route to local Ollama client model OR OpenAI fine-tuned model per flag
Fallback: if no client model ready, use full AccountState as RAG context
Telemetry: log which model version handled each request for drift tracking
OpenClaw :18789
PHASE 06

Flywheel Automation

Close the loop. New autoresearch runs automatically trigger re-scoring, JSONL delta append, and retrain when enough new high-quality examples accumulate. Model gets smarter with every client engagement.

Watcher: after each autoresearch run, check if new high-score count ≥ threshold (default 20)
Auto-append to JSONL, bump version, trigger fine-tune runner
Drift detection: compare eval scores v(N) vs v(N-1), rollback if regression
Slack/notification: post model promotion events to client channel
cron flywheel
// Account State Schema Inputs

GTM Container

  • Tag names + trigger conditions
  • Variable definitions
  • dataLayer event schema
  • Container workspace state
  • Published vs draft diff

Google Ads MCP

  • Campaign structure + types
  • Conversion action IDs + names
  • Enhanced Conversions config
  • PMAX asset group IDs
  • Historical performance flags

Pipeboard Meta MCP

  • Pixel ID + event list
  • CAPI configuration
  • Custom audience IDs
  • Ad account structure
  • Stape container state
// Training Record Format
{ "messages": [ { "role": "system", "content": "You are a tracking expert for HRE. GTM: GTM-XXXXXXX | sGTM: container-XXXXXXX Conversion actions: All Purchases (tag:AW-xxx/yyy), Lead Form Submit Known issue history: PMAX $0 value → fix: sGTM cart items mapping dataLayer schema: {event, ecommerce.value, ecommerce.items[]} Meta Pixel: HRE-pixel-id | CAPI: configured via Stape" }, { "role": "user", "content": "PMAX campaign shows conversions but $0 revenue value" }, { "role": "assistant", "content": "Root cause: sGTM purchase tag is reading value from top-level ecommerce object but HRE pushes revenue inside items[0].price * quantity. Fix: update sGTM variable to map items array and sum value. Tag: 'sGTM - GA4 Purchase' → Variable: {{DLV - ecommerce.items}} → recompute value." } ], "metadata": { "client_id": "hre", "score": 0.91, "run_id": "exp-2026-04-07-047", "sources": ["gtm", "google_ads"] } }

⟳ The Compounding Moat

Every client engagement generates experiments. Every experiment above threshold extends the training set. Every retrain makes the model more accurate for that client's specific patterns. At ~500 high-quality examples per client, the model knows their full tracking architecture better than any human consultant who hasn't lived in their accounts for years. This is the retention mechanic — the model is an asset that only you and the client share.

// Claude Code Prompt
claude --dangerously-skip-permissions # Client LLM Fine-Tune Pipeline — Phase 1 Bootstrap ## Context Repo: github.com/Organized-AI/gtm-autoresearch Infra: M3 Ultra (Ollama :11434 NoClaw), M4 Mini (OpenClaw :18789), claude-mem (SQLite + Chroma :37777) Clients: bioptimizers, rtt, teleios ## Phase 1 Task: Experiment Logger Instrumentation Read AGENT-HANDOFF/ and PLANNING/ first. Then: 1. Create packages/experiment-logger/ - ExperimentRecord TypeScript schema: client_id, run_id, problem, solution, score (float 0-1), timestamp, account_snapshot (JSON), sources_used string[] - SQLite writer extending claude-mem DB at :37777 - Score normalizer: validate 0.0–1.0 range - Client_id tagger on each record 2. Instrument autoresearch loop to call ExperimentLogger.save() on each scored experiment output 3. Add CLI: `pnpm experiment-logger export --client bioptimizers` outputs all records to stdout as JSONL 4. Write tests for schema validation and SQLite round-trip Use Organized Codebase structure. Update CLAUDE.md with experiment-logger package docs. Do not build Phase 2 yet. ## Env vars needed: CLAUDE_MEM_DB_PATH=/path/to/claude-mem.sqlite EXPERIMENT_LOG_DIR=./data/experiments SCORE_THRESHOLD=0.75