Phase 1 — Experiment Logger

System Overview

Phase 1 ← NOW

Experiment Logger

tests pass

Phase 2

Account State

—

Phase 3

JSONL Pipeline

—

planned

Phase 4

Fine-Tune Runner

—

planned

Phase 5

OpenClaw Brain

—

planned

Phase 6

Flywheel

—

planned

What Phase 1 Does

The Problem

autoresearch experiments are ephemeral

The autoresearch loop generates GTM experiments with scores, but they only exist in transient run logs. There's no structured store to query, filter, or export them for downstream fine-tuning. Every experiment is a potential training example — but only if it's captured with full context.

The Solution

persistent, queryable experiment store

A Zod-validated schema + local SQLite database that captures every experiment with its score, client ID, sources used, and account snapshot. Idempotent writes (INSERT OR IGNORE), indexed queries, and JSONL export for feeding into the Phase 3 training pipeline.

ExperimentRecord Schema

Field	Type	Description
id	UUID	Unique record identifier. Primary key in SQLite. Generated via uuid v4
client_id	string	"hre" \| "teleios" \| "rtt" etc. Indexed with run_id for fast client-scoped queries
run_id	string	Experiment run identifier, e.g. "exp-2026-04-07-047". Links to autoresearch run manifest
problem	string	The experiment input — the GTM problem or question being investigated
solution	string	The experiment output — the generated recommendation or fix
score	float 0.0–1.0	Quality score. Validated via Zod min(0)/max(1). Logger clamps out-of-range with warning
timestamp	ISO 8601	When the experiment was run. Enables time-range queries via the since filter
account_snapshot	object → JSON	Account state at experiment time. Stored as JSON text in SQLite, parsed back on read
sources_used	string[] → JSON	Data sources: ["gtm", "google_ads", "meta"]. Stored as JSON array text

SQLite Database — data/experiments.sqlite

TABLE experiments · WAL mode · INSERT OR IGNORE INDEX idx_experiments_client_run (client_id, run_id)

id TEXT PK UUID v4 — INSERT OR IGNORE prevents duplicates

client_id TEXT NOT NULL Indexed — fast lookups per client

run_id TEXT NOT NULL Indexed with client_id — compound queries

problem TEXT NOT NULL Becomes messages[1] (user role) in Phase 3 JSONL

solution TEXT NOT NULL Becomes messages[2] (assistant role) in Phase 3 JSONL

score REAL NOT NULL 0.0–1.0 — Phase 3 filters at ≥ 0.75 threshold

timestamp TEXT NOT NULL ISO 8601 — enables time-range queries

account_snapshot TEXT (JSON) Serialized object — tags, triggers, conversion values at experiment time

sources_used TEXT (JSON[]) Array of source identifiers — parsed back to string[] on read

Score Normalization

Validation Rules

Zod schema + logger clamp

0.85

pass

0.75

pass

1.1

→ 1.0

-0.5

→ 0.0

Out-of-range scores are clamped with a [Phase1] console warning. Zod rejects values outside 0.0–1.0 at the schema level — the logger clamps before validation to ensure writes always succeed.

Idempotency Guarantee

INSERT OR IGNORE on UUID primary key

Re-running the same experiment through the logger never creates duplicates. The SQLite INSERT OR IGNORE silently skips any record whose UUID already exists.

save(record) — inserted (new UUID)

save(record) — ignored (duplicate UUID)

count() → 1 row

ExperimentLogger Class API

save()

Save One

Validate, clamp score, insert single record

ExperimentRecord

saveBatch()

Batch Insert

Transaction-wrapped multi-row insert

ExperimentRecord[]

query()

Filter Query

client_id, min_score, since filters

ExperimentRecord[]

export()

JSONL Export

One JSON object per line to stdout

string (JSONL)

Project Structure

src/ types/ experiment.ts — Zod schema + TypeScript type experiment-logger/ db.ts — SQLite open, insert, query, count logger.ts — ExperimentLogger class index.ts — barrel export scripts/ experiment-logger.ts — CLI: export | count | import evals/ eval_experiment_logger.ts — 25 tests across 7 suites data/ experiments.sqlite — auto-created on first write (gitignored)

CLI Commands

────────────────────────────────────────────────────────
  Experiment Logger CLI — Phase 1
────────────────────────────────────────────────────────

# Export all experiments for a client as JSONL
$ npx tsx scripts/experiment-logger.ts export --client hre
{"id":"a1b2c3...","client_id":"hre","problem":"...","solution":"...","score":0.92,...}
{"id":"d4e5f6...","client_id":"hre","problem":"...","solution":"...","score":0.88,...}

# Count records for a client
$ npx tsx scripts/experiment-logger.ts count --client hre
[Phase1] 247 records for client "hre"

# Bulk import from JSON array file
$ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre
[Phase1] Saved batch of 50 experiments
[Phase1] Imported 50 records for client "hre"

# Re-import same file — no duplicates (idempotent)
$ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre
[Phase1] Saved batch of 50 experiments
[Phase1] Imported 50 records for client "hre"
$ npx tsx scripts/experiment-logger.ts count --client hre
[Phase1] 297 records for client "hre"  ← same count, no dupes

Eval Suite — 25 Tests

npx tsx evals/eval_experiment_logger.ts 25 passed · 0 failed

Schema Validation

✓ Valid record passes schema

✓ Missing fields fail schema

✓ Score > 1.0 fails schema validation

✓ Score < 0.0 fails schema validation

Score Normalization

✓ Score 0.75 saved as-is

✓ Score 1.1 clamped to 1.0

✓ Score -0.5 clamped to 0.0

SQLite Round-Trip

✓ Query returns 1 record

✓ ID matches

✓ client_id matches

✓ problem matches

✓ solution matches

✓ score matches

✓ account_snapshot is object

✓ account_snapshot.tags matches

✓ sources_used round-trips as array

✓ sources_used contains 'meta'

JSONL Export

✓ Export produces 2 lines

✓ Each JSONL line is valid JSON

Idempotency

✓ Saving same record 3 times produces 1 row

Client Filtering

✓ Query client 'alpha' returns 2 records

✓ Query client 'beta' returns 1 record

✓ Query with no filter returns all 3 records

✓ min_score filter returns only records >= 0.9

Batch Save

✓ Batch save of 50 records works

Dependencies Added

Runtime

production dependencies

better-sqlite3 — synchronous SQLite3 binding (native, WAL mode)

uuid — RFC 4122 v4 UUID generation

zod — already present (schema validation)

Dev

type definitions

@types/better-sqlite3

@types/uuid

tsx, typescript, @types/node — already present

How This Feeds the Pipeline

Data Flow: Phase 1 → Phase 3

experiment records become fine-tune training data

Phase 1 saves ExperimentRecord with problem, solution, score, account_snapshot ↓ Phase 2 enriches AccountStateCollector builds full system prompts per client ↓ Phase 3 transforms Filter by score ≥ 0.75, dedup, inject system prompt → JSONL training files ↓ Phase 4 trains Feed JSONL to OpenAI fine-tune API or Ollama local

Quick Start

# Clone and setup
$ git clone https://github.com/Organized-AI/gtm-autoresearch.git
$ cd gtm-autoresearch && git checkout feature/finetune-pipeline
$ npm install

# Run the eval suite to verify everything works
$ npx tsx evals/eval_experiment_logger.ts
  === Results: 25 passed, 0 failed ===

# Type check
$ npx tsc --noEmit
  (no errors)

# Import some experiments
$ npx tsx scripts/experiment-logger.ts import --file my-experiments.json --client hre

# Check the count
$ npx tsx scripts/experiment-logger.ts count --client hre

# Export as JSONL (pipe to file or next phase)
$ npx tsx scripts/experiment-logger.ts export --client hre > hre-experiments.jsonl

Claude Code Prompt

claude --dangerously-skip-permissions

# Phase 1: Experiment Logger
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch

## Context
This is Phase 1 of 6 in the fine-tune pipeline. It instruments
the autoresearch loop to persist structured experiment records
with scores, enabling downstream training data generation.

## Task

Read CLAUDE.md and AGENT-HANDOFF/CURRENT-STATE.md first. Then:

1. Create ExperimentRecord schema (src/types/experiment.ts)
   - Zod-validated: id (UUID), client_id, run_id, problem,
     solution, score (0.0–1.0), timestamp (ISO 8601),
     account_snapshot (object), sources_used (string[])

2. SQLite writer (src/experiment-logger/db.ts)
   - better-sqlite3, WAL mode, data/experiments.sqlite
   - INSERT OR IGNORE on id (idempotent)
   - Index on (client_id, run_id)

3. ExperimentLogger class (src/experiment-logger/logger.ts)
   - save(), saveBatch(), query(), export(), count()
   - Score clamping 0.0–1.0 with [Phase1] warning

4. CLI script (scripts/experiment-logger.ts)
   - export --client, count --client, import --file --client

5. Eval suite (evals/eval_experiment_logger.ts)
   - Schema validation, score normalization, SQLite round-trip,
     JSONL export, idempotency, client filtering, batch save

6. Update CLAUDE.md + AGENT-HANDOFF/CURRENT-STATE.md

## Do NOT build Phase 2 (Account State Collector) yet.

← All Docs gtm-autoresearch-docs.pages.dev Phase 2: Account State →