Pulls live client account data from GTM, Google Ads MCP, and Pipeboard Meta MCP. Normalizes into a versioned AccountState JSON object injected as system prompt context into every training record.
AccountState { client_id: string // "bioptimizers" | "teleios" | "rtt" snapshot_at: ISO 8601 timestamp // when this state was captured version: semver string // "1.0.0" → bump on meaningful change gtm: { container_id: string // "GTM-XXXXXXX" workspace_id: string tags: [{ name, type, firing_triggers[], paused, notes }] triggers: [{ name, type, filter_conditions[], custom_event_filter }] variables: [{ name, type, parameter (dataLayer key, JS var, etc.) }] datalayer_schema: string[] // inferred event names from triggers published_vs_draft_diff: string // summary of unpublished changes } google_ads: { account_id: string // "123-456-7890" campaigns: [{ id, name, type (PMAX|SEARCH|DISPLAY|VIDEO), status, budget }] conversion_actions: [{ id, name, tag_snippet, category, value_settings, enhanced_conversions }] linked_gtm_containers: string[] performance_flags: string[] // known issues like "$0 value on PMAX" } meta: { pixel_id: string ad_account_id: string events_fired: [{ event_name, source (pixel|capi|both), last_received }] capi_config: { enabled, access_token_set, stape_container_id, match_rate } custom_audiences: [{ id, name, size_range, source }] } memory_summary: string // top 5 past fixes from claude-mem Chroma known_issues: string[] // manually flagged or auto-detected system_prompt: string // pre-rendered, ready to inject into JSONL }
| Source | MCP Tool | Output Field | Notes |
|---|---|---|---|
| GTM | export_container |
gtm.tags, triggers, variables | Parse JSON → normalize tag schema |
| GTM | list_workspaces |
gtm.workspace_id | Always grab latest workspace |
| Google Ads | GAQL: SELECT campaign.id, campaign.name, campaign.advertising_channel_type |
google_ads.campaigns | Filter: status != REMOVED |
| Google Ads | GAQL: SELECT conversion_action.id, conversion_action.name, conversion_action.tag_snippets |
google_ads.conversion_actions | Include value_settings object |
| Meta | get_pixels |
meta.pixel_id, events_fired | Pull last 7d event diagnostics |
| Meta | get_custom_audiences |
meta.custom_audiences | Size + source only (no PII) |
| claude-mem | Chroma similarity query | memory_summary | Top 5 by relevance to client_id |
Call GTM MCP export_container(containerId, workspaceId). Raw export is deeply nested Google format with numeric type codes. Save as raw_gtm_{timestamp}.json for versioning.
Map tag.type numeric codes → human labels (e.g. ua→"Universal Analytics", gclidw→"Google Ads Conversion", html→"Custom HTML"). Extract firing trigger names by joining on firingTriggerId.
Parse filter[] conditions on each trigger. Custom Event triggers → extract customEventFilter[].parameter.value as the dataLayer event name. These become the datalayer_schema[] array.
For each variable: extract type (dlv/jsm/k/etc.) and parameter (the actual dataLayer key path like ecommerce.value). This tells the fine-tuned model exactly where data lives in the client's dataLayer schema.
Compare live container export against current workspace. Any tags/triggers added in workspace but not yet published get flagged in published_vs_draft_diff. Prevents training on configs that aren't live yet.
Flatten the full AccountState into a compact string suitable for the system field in training JSONL. Target: under 800 tokens. Use abbreviation for large arrays (e.g. "27 tags including: GA4 Config, PMAX Conversion, [+25 more]").
You are a conversion tracking expert for HRE.
GTM: GTM-XXXXXXX | Workspace: ws_4 | 31 tags, 18 triggers, 22 variables
Key tags: GA4 Config, GA4 - Purchase, AW - All Purchases (PMAX),
sGTM Bridge - Ecommerce, Custom HTML - DataLayer Push
Key triggers: Page View, Purchase (dataLayer: 'purchase'),
Lead Submit (dataLayer: 'generate_lead')
dataLayer schema: page_view, view_item, add_to_cart,
begin_checkout, purchase, generate_lead
DLV mappings: ecommerce.value, ecommerce.currency,
ecommerce.items[].item_id, ecommerce.items[].price
Google Ads: 123-456-7890 | Currency: USD
Campaigns: PMAX - Core (PMAX), Brand - HRE (SEARCH),
Retargeting - Site Visitors (DISPLAY)
Conversions: All Purchases (AW-xxx/yyy, value: ecommerce.value),
Lead Form Submit (AW-xxx/zzz, count-only)
Enhanced Conversions: ENABLED
Meta Pixel: HRE-pixel-id | Ad Account: act_12345
Events (browser+CAPI): PageView, ViewContent, AddToCart,
InitiateCheckout, Purchase
CAPI match rate: ~68% | Stape container: cnt_abc123
Known issues: PMAX conversion shows $0 value — root cause:
sGTM purchase tag reads top-level ecommerce.value but
HRE pushes revenue inside items[].price * quantity.
Past fix: updated sGTM variable to sum items array.
Memory: Cart abandonment tracking fixed 2025-11,
dataLayer conflict resolved 2025-09,
PMAX value fix deployed 2026-01
AccountState is versioned with semver. Patch bump (1.0.x): campaign names changed, new tag added. Minor bump (1.x.0): dataLayer schema changed, conversion action added. Major bump (x.0.0): platform migrated, GTM container rebuilt. Each experiment record stores the AccountState version it was generated under — if you need to retrain from scratch after a major account restructure, you can filter to only post-migration experiments by version range.
claude --dangerously-skip-permissions
# Phase 2: Account State Collector
# Branch: feature/finetune-pipeline
# Repo: github.com/Organized-AI/gtm-autoresearch
## Context
Phase 1 (ExperimentLogger) is complete. Now building the
Account State Collector that pulls live client data from
GTM, Google Ads, and Meta, normalizes it into a versioned
AccountState object, and renders a compact system prompt string.
## Task
Read AGENT-HANDOFF/ and PLANNING/ first. Then:
1. Create packages/account-state-collector/
AccountState TypeScript interface (full schema):
- client_id, snapshot_at, version (semver)
- gtm: { container_id, workspace_id, tags[], triggers[],
variables[], datalayer_schema[], draft_diff }
- google_ads: { account_id, campaigns[], conversion_actions[],
enhanced_conversions, performance_flags[] }
- meta: { pixel_id, ad_account_id, events_fired[],
capi_config{}, custom_audiences[] }
- memory_summary (string from Chroma top-5 query)
- known_issues (string[])
- system_prompt (pre-rendered, target ≤800 tokens)
2. Collectors (one file per source):
- gtm-collector.ts → calls GTM MCP (gtm-mcp.stape.ai/mcp)
export_container + list_workspaces
Normalize: map tag type codes → labels, extract
datalayer_schema from customEventFilter values
- google-ads-collector.ts → GAQL via TrueClicks MCP
Queries: campaigns + conversion_actions (see PLANNING/)
- meta-collector.ts → Pipeboard Meta MCP
get_pixels + get_custom_audiences
- memory-collector.ts → query Chroma :37777 by client_id,
return top 5 relevant memories as summary string
3. system-prompt-renderer.ts
Flatten AccountState → compact string ≤800 tokens
Abbreviate large arrays: "27 tags including: X, Y, [+25 more]"
Include known_issues and memory_summary at bottom
4. Versioning:
- Compare new snapshot vs previous stored version
- Semver bump logic: schema change = minor,
platform migration = major, minor changes = patch
- Store at: data/clients/{client_id}/account_state_v{N}.json
5. CLI: `pnpm account-state collect --client bioptimizers`
Runs all collectors, renders system_prompt, saves versioned file
6. Unit tests for:
- GTM tag normalization (type code → label)
- datalayer_schema extraction from trigger filters
- system_prompt token count guard (assert ≤800)
- semver bump logic
## Env vars:
GTM_MCP_URL=https://gtm-mcp.stape.ai/mcp
GOOGLE_ADS_MCP_URL=https://mcp.gaql.app/sse/google-ads/...
META_MCP_URL=https://mcp.pipeboard.co/meta-ads-mcp
CHROMA_URL=http://localhost:37777
CLIENT_DATA_DIR=./data/clients
## Clients to test against:
- hre (GTM-XXXXXXX, primary repo client)
- teleios (GTM-WM5S3WSG)
Do NOT build Phase 3 (training data pipeline) yet.