The autoresearch loop generates GTM experiments with scores, but they only exist in transient run logs. There's no structured store to query, filter, or export them for downstream fine-tuning. Every experiment is a potential training example — but only if it's captured with full context.
A Zod-validated schema + local SQLite database that captures every experiment with its score, client ID, sources used, and account snapshot. Idempotent writes (INSERT OR IGNORE), indexed queries, and JSONL export for feeding into the Phase 3 training pipeline.
| Field | Type | Description |
|---|---|---|
| id | UUID | Unique record identifier. Primary key in SQLite. Generated via uuid v4 |
| client_id | string | "hre" | "teleios" | "rtt" etc. Indexed with run_id for fast client-scoped queries |
| run_id | string | Experiment run identifier, e.g. "exp-2026-04-07-047". Links to autoresearch run manifest |
| problem | string | The experiment input — the GTM problem or question being investigated |
| solution | string | The experiment output — the generated recommendation or fix |
| score | float 0.0–1.0 | Quality score. Validated via Zod min(0)/max(1). Logger clamps out-of-range with warning |
| timestamp | ISO 8601 | When the experiment was run. Enables time-range queries via the since filter |
| account_snapshot | object → JSON | Account state at experiment time. Stored as JSON text in SQLite, parsed back on read |
| sources_used | string[] → JSON | Data sources: ["gtm", "google_ads", "meta"]. Stored as JSON array text |
[Phase1] console warning. Zod rejects values outside 0.0–1.0 at the schema level — the logger clamps before validation to ensure writes always succeed.
Re-running the same experiment through the logger never creates duplicates. The SQLite INSERT OR IGNORE silently skips any record whose UUID already exists.
──────────────────────────────────────────────────────── Experiment Logger CLI — Phase 1 ──────────────────────────────────────────────────────── # Export all experiments for a client as JSONL $ npx tsx scripts/experiment-logger.ts export --client hre {"id":"a1b2c3...","client_id":"hre","problem":"...","solution":"...","score":0.92,...} {"id":"d4e5f6...","client_id":"hre","problem":"...","solution":"...","score":0.88,...} # Count records for a client $ npx tsx scripts/experiment-logger.ts count --client hre [Phase1] 247 records for client "hre" # Bulk import from JSON array file $ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre [Phase1] Saved batch of 50 experiments [Phase1] Imported 50 records for client "hre" # Re-import same file — no duplicates (idempotent) $ npx tsx scripts/experiment-logger.ts import --file experiments.json --client hre [Phase1] Saved batch of 50 experiments [Phase1] Imported 50 records for client "hre" $ npx tsx scripts/experiment-logger.ts count --client hre [Phase1] 297 records for client "hre" ← same count, no dupes
# Clone and setup $ git clone https://github.com/Organized-AI/gtm-autoresearch.git $ cd gtm-autoresearch && git checkout feature/finetune-pipeline $ npm install # Run the eval suite to verify everything works $ npx tsx evals/eval_experiment_logger.ts === Results: 25 passed, 0 failed === # Type check $ npx tsc --noEmit (no errors) # Import some experiments $ npx tsx scripts/experiment-logger.ts import --file my-experiments.json --client hre # Check the count $ npx tsx scripts/experiment-logger.ts count --client hre # Export as JSONL (pipe to file or next phase) $ npx tsx scripts/experiment-logger.ts export --client hre > hre-experiments.jsonl
claude --dangerously-skip-permissions
# Phase 1: Experiment Logger
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch
## Context
This is Phase 1 of 6 in the fine-tune pipeline. It instruments
the autoresearch loop to persist structured experiment records
with scores, enabling downstream training data generation.
## Task
Read CLAUDE.md and AGENT-HANDOFF/CURRENT-STATE.md first. Then:
1. Create ExperimentRecord schema (src/types/experiment.ts)
- Zod-validated: id (UUID), client_id, run_id, problem,
solution, score (0.0–1.0), timestamp (ISO 8601),
account_snapshot (object), sources_used (string[])
2. SQLite writer (src/experiment-logger/db.ts)
- better-sqlite3, WAL mode, data/experiments.sqlite
- INSERT OR IGNORE on id (idempotent)
- Index on (client_id, run_id)
3. ExperimentLogger class (src/experiment-logger/logger.ts)
- save(), saveBatch(), query(), export(), count()
- Score clamping 0.0–1.0 with [Phase1] warning
4. CLI script (scripts/experiment-logger.ts)
- export --client, count --client, import --file --client
5. Eval suite (evals/eval_experiment_logger.ts)
- Schema validation, score normalization, SQLite round-trip,
JSONL export, idempotency, client filtering, batch save
6. Update CLAUDE.md + AGENT-HANDOFF/CURRENT-STATE.md
## Do NOT build Phase 2 (Account State Collector) yet.