gtm-autoresearch // feature/finetune-pipeline // Phase 1 of 6

Experiment
Logger

01 Schema · SQLite
CLI · Eval Suite
v1.0.0 — Built
System Overview
Phase 1 ← NOW
Experiment Logger
25
tests pass
Phase 2
Account State
next
Phase 3
JSONL Pipeline
planned
Phase 4
Fine-Tune Runner
planned
Phase 5
OpenClaw Brain
planned
Phase 6
Flywheel
planned
What Phase 1 Does

The Problem

autoresearch experiments are ephemeral

The autoresearch loop generates GTM experiments with scores, but they only exist in transient run logs. There's no structured store to query, filter, or export them for downstream fine-tuning. Every experiment is a potential training example — but only if it's captured with full context.

The Solution

persistent, queryable experiment store

A Zod-validated schema + local SQLite database that captures every experiment with its score, client ID, sources used, and account snapshot. Idempotent writes (INSERT OR IGNORE), indexed queries, and JSONL export for feeding into the Phase 3 training pipeline.

ExperimentRecord Schema
FieldTypeDescription
id UUID Unique record identifier. Primary key in SQLite. Generated via uuid v4
client_id string "hre" | "teleios" | "rtt" etc. Indexed with run_id for fast client-scoped queries
run_id string Experiment run identifier, e.g. "exp-2026-04-07-047". Links to autoresearch run manifest
problem string The experiment input — the GTM problem or question being investigated
solution string The experiment output — the generated recommendation or fix
score float 0.0–1.0 Quality score. Validated via Zod min(0)/max(1). Logger clamps out-of-range with warning
timestamp ISO 8601 When the experiment was run. Enables time-range queries via the since filter
account_snapshot object → JSON Account state at experiment time. Stored as JSON text in SQLite, parsed back on read
sources_used string[] → JSON Data sources: ["gtm", "google_ads", "meta"]. Stored as JSON array text
SQLite Database — data/experiments.sqlite
TABLE experiments · WAL mode · INSERT OR IGNORE INDEX idx_experiments_client_run (client_id, run_id)
id TEXT PK UUID v4 — INSERT OR IGNORE prevents duplicates
client_id TEXT NOT NULL Indexed — fast lookups per client
run_id TEXT NOT NULL Indexed with client_id — compound queries
problem TEXT NOT NULL Becomes messages[1] (user role) in Phase 3 JSONL
solution TEXT NOT NULL Becomes messages[2] (assistant role) in Phase 3 JSONL
score REAL NOT NULL 0.0–1.0 — Phase 3 filters at ≥ 0.75 threshold
timestamp TEXT NOT NULL ISO 8601 — enables time-range queries
account_snapshot TEXT (JSON) Serialized object — tags, triggers, conversion values at experiment time
sources_used TEXT (JSON[]) Array of source identifiers — parsed back to string[] on read
Score Normalization

Validation Rules

Zod schema + logger clamp
0.85
pass
0.75
pass
1.1
→ 1.0
-0.5
→ 0.0
Out-of-range scores are clamped with a [Phase1] console warning. Zod rejects values outside 0.0–1.0 at the schema level — the logger clamps before validation to ensure writes always succeed.

Idempotency Guarantee

INSERT OR IGNORE on UUID primary key

Re-running the same experiment through the logger never creates duplicates. The SQLite INSERT OR IGNORE silently skips any record whose UUID already exists.

save(record) — inserted (new UUID)
save(record) — ignored (duplicate UUID)
save(record) — ignored (duplicate UUID)
count() → 1 row
ExperimentLogger Class API
save()
Save One
Validate, clamp score, insert single record
ExperimentRecord
saveBatch()
Batch Insert
Transaction-wrapped multi-row insert
ExperimentRecord[]
query()
Filter Query
client_id, min_score, since filters
ExperimentRecord[]
export()
JSONL Export
One JSON object per line to stdout
string (JSONL)
Project Structure
src/ types/ experiment.ts — Zod schema + TypeScript type experiment-logger/ db.ts — SQLite open, insert, query, count logger.ts — ExperimentLogger class index.ts — barrel export scripts/ experiment-logger.ts — CLI: export | count | import evals/ eval_experiment_logger.ts — 25 tests across 7 suites data/ experiments.sqlite — auto-created on first write (gitignored)
CLI Commands
────────────────────────────────────────────────────────
  Experiment Logger CLI — Phase 1
────────────────────────────────────────────────────────

# Export all experiments for a client as JSONL
$ npx tsx scripts/experiment-logger.ts export --client hre
{"id":"a1b2c3...","client_id":"hre","problem":"...","solution":"...","score":0.92,...}
{"id":"d4e5f6...","client_id":"hre","problem":"...","solution":"...","score":0.88,...}

# Count records for a client
$ npx tsx scripts/experiment-logger.ts count --client hre
[Phase1] 247 records for client "hre"

# Bulk import from JSON array file
$ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre
[Phase1] Saved batch of 50 experiments
[Phase1] Imported 50 records for client "hre"

# Re-import same file — no duplicates (idempotent)
$ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre
[Phase1] Saved batch of 50 experiments
[Phase1] Imported 50 records for client "hre"
$ npx tsx scripts/experiment-logger.ts count --client hre
[Phase1] 297 records for client "hre"  ← same count, no dupes
    
Eval Suite — 25 Tests
npx tsx evals/eval_experiment_logger.ts 25 passed · 0 failed
Schema Validation
Valid record passes schema
Missing fields fail schema
Score > 1.0 fails schema validation
Score < 0.0 fails schema validation
Score Normalization
Score 0.75 saved as-is
Score 1.1 clamped to 1.0
Score -0.5 clamped to 0.0
SQLite Round-Trip
Query returns 1 record
ID matches
client_id matches
problem matches
solution matches
score matches
account_snapshot is object
account_snapshot.tags matches
sources_used round-trips as array
sources_used contains 'meta'
JSONL Export
Export produces 2 lines
Each JSONL line is valid JSON
Idempotency
Saving same record 3 times produces 1 row
Client Filtering
Query client 'alpha' returns 2 records
Query client 'beta' returns 1 record
Query with no filter returns all 3 records
min_score filter returns only records >= 0.9
Batch Save
Batch save of 50 records works
Dependencies Added

Runtime

production dependencies
better-sqlite3 — synchronous SQLite3 binding (native, WAL mode)
uuid — RFC 4122 v4 UUID generation
zod — already present (schema validation)

Dev

type definitions
@types/better-sqlite3
@types/uuid
tsx, typescript, @types/node — already present
How This Feeds the Pipeline

Data Flow: Phase 1 → Phase 3

experiment records become fine-tune training data
Phase 1 saves ExperimentRecord with problem, solution, score, account_snapshot Phase 2 enriches AccountStateCollector builds full system prompts per client Phase 3 transforms Filter by score ≥ 0.75, dedup, inject system prompt → JSONL training files Phase 4 trains Feed JSONL to OpenAI fine-tune API or Ollama local
Quick Start
# Clone and setup
$ git clone https://github.com/Organized-AI/gtm-autoresearch.git
$ cd gtm-autoresearch && git checkout feature/finetune-pipeline
$ npm install

# Run the eval suite to verify everything works
$ npx tsx evals/eval_experiment_logger.ts
  === Results: 25 passed, 0 failed ===

# Type check
$ npx tsc --noEmit
  (no errors)

# Import some experiments
$ npx tsx scripts/experiment-logger.ts import --file my-experiments.json --client hre

# Check the count
$ npx tsx scripts/experiment-logger.ts count --client hre

# Export as JSONL (pipe to file or next phase)
$ npx tsx scripts/experiment-logger.ts export --client hre > hre-experiments.jsonl
    
Claude Code Prompt
claude --dangerously-skip-permissions # Phase 1: Experiment Logger # Branch: feature/finetune-pipeline # Repo: github.com/Organized-AI/gtm-autoresearch ## Context This is Phase 1 of 6 in the fine-tune pipeline. It instruments the autoresearch loop to persist structured experiment records with scores, enabling downstream training data generation. ## Task Read CLAUDE.md and AGENT-HANDOFF/CURRENT-STATE.md first. Then: 1. Create ExperimentRecord schema (src/types/experiment.ts) - Zod-validated: id (UUID), client_id, run_id, problem, solution, score (0.0–1.0), timestamp (ISO 8601), account_snapshot (object), sources_used (string[]) 2. SQLite writer (src/experiment-logger/db.ts) - better-sqlite3, WAL mode, data/experiments.sqlite - INSERT OR IGNORE on id (idempotent) - Index on (client_id, run_id) 3. ExperimentLogger class (src/experiment-logger/logger.ts) - save(), saveBatch(), query(), export(), count() - Score clamping 0.0–1.0 with [Phase1] warning 4. CLI script (scripts/experiment-logger.ts) - export --client, count --client, import --file --client 5. Eval suite (evals/eval_experiment_logger.ts) - Schema validation, score normalization, SQLite round-trip, JSONL export, idempotency, client filtering, batch save 6. Update CLAUDE.md + AGENT-HANDOFF/CURRENT-STATE.md ## Do NOT build Phase 2 (Account State Collector) yet.
← All Docs gtm-autoresearch-docs.pages.dev Phase 2: Account State →