System Overview
Phase 1
Experiment Logger
✓
complete
Phase 2
Account State
✓
complete
Phase 3 ← NOW
JSONL Pipeline
—
building
Phase 4
Fine-Tune Runner
—
next
Phase 5
OpenClaw Brain
—
planned
Phase 6
Flywheel
—
planned
Step 1 — Score Filter
Score Distribution
Typical autoresearch run · 100 experiments
✓ ~55 records kept per run · threshold configurable per client
Deduplication
Cosine similarity via Chroma :37777 · threshold 0.92
Incoming experiment vs existing training set:
"PMAX shows $0 conversion value — sGTM items mapping issue"
sim: 0.97 — DROP
≈
"PMAX revenue zero — fix sGTM ecommerce.items array"
"Add to Cart firing on page load instead of button click"
sim: 0.31 — KEEP
≠
"Purchase tag misfiring on order confirmation reload"
Prevents model overfitting to repeated variations of the same problem.
New experiments are embedded and compared before write.
Step 2 — System Prompt Injection
GTM Context
GTM: GTM-XXXXXXX | ws_4 | 31 tags, 18 triggers, 22 variables
Key tags: GA4 Config, GA4 - Purchase, AW - All Purchases,
sGTM Bridge, Custom HTML - DataLayer Push
dataLayer: page_view, view_item, add_to_cart, purchase, generate_lead
DLV: ecommerce.value, ecommerce.items[].item_id, ecommerce.items[].price
Google Ads
Google Ads: 123-456-7890 | USD
Campaigns: PMAX - Core, Brand - HRE (SEARCH), Retargeting (DISPLAY)
Conversions: All Purchases (value: ecommerce.value), Lead Submit (count)
Enhanced Conversions: ENABLED
Meta / CAPI
Meta Pixel: HRE-pixel-id | act_12345
Events (browser+CAPI): PageView, ViewContent, AddToCart, Purchase
CAPI match rate: ~68% | Stape: cnt_abc123
Memory
Cart abandonment tracking fixed 2025-11
dataLayer conflict resolved 2025-09
PMAX value fix deployed 2026-01
Lead form dedup logic added 2026-03
Known Issues
→ PMAX $0 value: sGTM reads ecommerce.value but HRE pushes
revenue inside items[].price * quantity
→ Lead dedup: generate_lead fires on page load + form submit
tokens
~576 / 800
✓ within budget
Step 3 — JSONL Record Schema
messages[0].role
string
Always "system" — contains rendered AccountState.system_prompt (≤800 tokens)
messages[1].role
string
Always "user" — the problem/question from the autoresearch experiment input
messages[2].role
string
Always "assistant" — the high-scoring solution output (score ≥ 0.75)
metadata.client_id
string
e.g. "hre" — used to filter and route to the correct fine-tuned model
metadata.score
float
Original experiment score 0.0–1.0. Stored for post-training analysis and threshold tuning
metadata.run_id
string
Experiment run identifier. Links back to ExperimentRecord in SQLite for traceability
metadata.account_state_version
semver
AccountState version snapshot at time of experiment. Enables version-gated retraining after major account changes
metadata.sources
string[]
Which data sources were injected: ["gtm", "google_ads", "meta", "memory"]
metadata.chroma_embedding_id
string
Chroma vector ID for this record — enables dedup lookup on future runs
Step 4 — Quality Gates Before Export
⚖️
Min Examples
Won't export JSONL until minimum batch size reached
50
📏
Token Guard
Rejects records where system prompt exceeds token ceiling
800
🔁
Dedup Rate
Flags run if >40% of new experiments are duplicates
40%
🧪
Holdout Split
10% withheld per version for post-fine-tune eval scoring
10%
CLI Output — pnpm pipeline export --client hre
$ pnpm pipeline export --client hre --threshold 0.75
─────────────────────────────────────────────
JSONL Training Pipeline — HRE
─────────────────────────────────────────────
Loading experiments from SQLite... ✓ 247 records found
Applying score filter (≥0.75)... ✓ 138 records pass
Loading AccountState v1.2.0... ✓ system_prompt: 576 tokens
Injecting system prompts... ✓ 138 records injected
Running dedup check via Chroma... ⚠ 14 near-duplicates removed (sim ≥0.92)
Remaining after dedup: ✓ 124 records
Checking quality gates:
Min examples (50): ✓ 124 ≥ 50
Token guard (800): ✓ all records within budget
Dedup rate (<40%): ✓ 10.1% dedup rate
Holdout split (10%): ✓ 12 records withheld → eval set
─────────────────────────────────────────────
Output: data/clients/hre/v3.jsonl 112 training records
Eval: data/clients/hre/v3_eval.jsonl 12 eval records
Version: v3 (prev: v2 · delta: +112 new records)
─────────────────────────────────────────────
✓ Ready for Phase 4 fine-tune runner
Next: pnpm fine-tune submit --client hre --version 3
Claude Code Prompt
claude --dangerously-skip-permissions
# Phase 3: JSONL Training Data Pipeline
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch
## Context
Phase 1 (ExperimentLogger) and Phase 2 (AccountStateCollector) are complete.
Phase 3 builds the pipeline that transforms stored experiments into
clean, fine-tune-ready JSONL files. Primary client: HRE.
## Task
Read AGENT-HANDOFF/ and PLANNING/ first. Then:
1. Create packages/training-pipeline/
ScoreFilter:
- Query SQLite experiments table: SELECT * WHERE score >= threshold
- Default threshold: 0.75 (env: SCORE_THRESHOLD)
- Return ExperimentRecord[] sorted by score DESC
DedupChecker (Chroma :37777):
- Embed each experiment's solution text via Chroma embedding
- Query: find existing records with cosine_sim >= 0.92
- If match found: skip record, log as duplicate
- If new: write embedding to Chroma collection "training_{client_id}"
SystemPromptInjector:
- Load AccountState from data/clients/{client_id}/account_state_v{N}.json
- For each passing record: build messages[0] = {role:"system", content: account_state.system_prompt}
- Token guard: count tokens (use tiktoken cl100k_base), reject if > 800
- Build full message array: [system, user, assistant]
QualityGates (assert before export):
- MIN_EXAMPLES = 50 (configurable)
- TOKEN_CEILING = 800
- MAX_DEDUP_RATE = 0.40 (warn + halt if exceeded)
- HOLDOUT_SPLIT = 0.10
JSONLExporter:
- Write to: data/clients/{client_id}/v{N}.jsonl
- Holdout: data/clients/{client_id}/v{N}_eval.jsonl
- Each line: {messages:[...], metadata:{client_id, score, run_id,
account_state_version, sources, chroma_embedding_id}}
- Versioning: auto-increment v{N} by checking existing files
- Delta mode: --delta flag to only export records since last version
2. CLI: pnpm pipeline export --client hre [--threshold 0.75] [--delta]
- Print progress as shown in spec
- Exit 0 on success, 1 on gate failure
3. Unit tests:
- Score filter boundary conditions (0.749 excluded, 0.75 included)
- Token guard rejects > 800 token system prompts
- Dedup correctly identifies near-duplicate solutions
- QualityGate halts export when MIN_EXAMPLES not met
- JSONL output is valid JSON per line (json-lines format)
4. Update CLAUDE.md with training-pipeline package docs
## Env vars:
SCORE_THRESHOLD=0.75
CHROMA_URL=http://localhost:37777
CLIENT_DATA_DIR=./data/clients
MIN_TRAINING_EXAMPLES=50
TOKEN_CEILING=800
HOLDOUT_SPLIT=0.10
## Do NOT build Phase 4 (fine-tune runner) yet.