Blog analytics
analytics

GA4, GTM, and UTM tracking on Astro + EmDash: the setup that survives

For agencies and marketing leads tired of analytics that work in October and break by January — the exact GTM container shape, UTM cookie pattern, and CSP rules that don't rot.

Editorial illustration of a GTM container routing GA4 tags and UTM parameters on an Astro + EmDash site.

This is the part of a WordPress migration most teams underspend on. Then six months later they wonder why their cost-per-lead numbers don’t match between Google Ads and HubSpot. The answer is almost always one of three things: UTMs vanish on the first internal link, GTM and GA4 are double-counting because of a duplicate tag, or the form’s hidden-field capture script broke when someone deployed a CSP update.

Below is the setup we ship on every migration. It works on Astro, plays well with EmDash, and survives the kind of small drift that breaks WordPress analytics every quarter.

Who this is for

In-house marketing leads who own GA4 and want it to actually report what they think it’s reporting. Agency analytics owners running tracking across 20+ client sites who need a setup that doesn’t drift. If you’re a developer wiring this from scratch, the code below is copy-pasteable.

If you’ve never heard of GTM or are looking for “is GA4 OK to use” content, this isn’t that post. Pre-reading: GA4 official docs and the GTM beginner guide.

The two principles

Two rules carry the whole setup:

1. GTM is the only tag the site loads. Everything else lives inside GTM.

GA4, Hotjar, ad pixels (LinkedIn Insight, Meta, X), Microsoft Clarity, conversion APIs — all of them load as tags inside the GTM container, not as separate <script> tags on the page. This means:

  • One CSP allowlist entry, not seven.
  • Marketing can add a new pixel by editing the GTM container, not opening a developer ticket.
  • One choke point for consent management when GDPR comes for you.
  • Page-load weight stays bounded.

2. UTMs are captured to a first-party cookie before any link click.

The default Google Analytics behavior is “lose UTM parameters on the first internal click.” That’s why your “where did this lead come from” data is wrong. Capture the UTMs once on landing, persist them to a first-party cookie, and read the cookie at form-submit time. 30-line script.

If you do those two things and nothing else, your analytics will be more accurate than 80% of WordPress sites we audit.

The implementation, in five files

1. The GTM script tag

One inline script in the site’s <head>. On Astro, this lives in your base layout:

---
// src/components/layout/Analytics.astro
import { PUBLIC_GTM_ID } from 'astro:env/client';
const enabled =
  PUBLIC_GTM_ID && PUBLIC_GTM_ID !== 'GTM-XXXXXXX' && PUBLIC_GTM_ID.startsWith('GTM-');
---
{enabled && (
  <script is:inline define:vars={{ id: PUBLIC_GTM_ID }}>
    (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer',id);
  </script>
)}

Two things worth highlighting:

  • The placeholder guard (!== 'GTM-XXXXXXX'). In dev, preview, and CI we run with the placeholder ID, which means GTM never actually loads. This keeps Lighthouse runs clean and stops dev traffic from contaminating prod analytics. Real GTM IDs match GTM-[A-Z0-9]+; the placeholder doesn’t, so the guard is one line of safety.
  • astro:env/client is the typed env import. The schema lives in astro.config.mjs with a sensible placeholder default. Quang sets the real ID in the Cloudflare Pages env vars; the rest of the team doesn’t need to touch it.

The <noscript> iframe (the second half of the official GTM snippet) goes immediately after <body> in the same conditional pattern. Without it, GTM misses ~5% of users who block JavaScript or are pre-render fetches.

2. The CSP allowlist

CSP is what breaks analytics six months later when someone tightens security headers. Get this right once.

# public/_headers
/*
  Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' https://www.googletagmanager.com https://www.google-analytics.com; img-src 'self' data: https:; connect-src 'self' https://www.google-analytics.com https://www.googletagmanager.com; frame-src https://www.googletagmanager.com

The bits that matter for analytics:

  • script-src lists googletagmanager.com (loads gtm.js) and google-analytics.com (loads gtag.js, which GTM injects).
  • connect-src lists both endpoints because they fetch to send beacons.
  • img-src https: is needed for tracking pixels, which load as image requests.
  • frame-src googletagmanager.com is needed for the noscript iframe and for GTM’s debug mode.

Add other tracker domains here as you add tracker tags inside GTM. Linkedin Insight needs px.ads.linkedin.com. Meta Pixel needs connect.facebook.net. The CSP is the only place these surface — you don’t add <script> tags.

3. The UTM persistence script

Everything is downstream of this small script:

// public/utm.js — included via <script> in the head, no module
(function () {
  const KEYS = ['utm_source', 'utm_medium', 'utm_campaign', 'utm_term', 'utm_content'];
  const COOKIE = 'emdash_utm';
  const TTL = 60 * 60 * 24 * 30; // 30 days

  const params = new URLSearchParams(location.search);
  const fromUrl = {};
  KEYS.forEach((k) => {
    const v = params.get(k);
    if (v) fromUrl[k] = v;
  });

  if (Object.keys(fromUrl).length) {
    const value = encodeURIComponent(JSON.stringify(fromUrl));
    document.cookie = `${COOKIE}=${value};max-age=${TTL};path=/;SameSite=Lax`;
  }
})();

Three properties of this script that matter:

  • First-party cookie. Set on your own domain, no third party involved. GDPR-friendly by default, no consent banner required for this one.
  • First-touch attribution. We capture the first UTMs the visitor lands with and don’t overwrite them on subsequent visits. The lead came from the LinkedIn ad three weeks ago, even if today they’re coming directly to /pricing.
  • 30-day window. Long enough for a B2B sales cycle, short enough that stale attribution doesn’t drift into “this lead came from a LinkedIn campaign that ended six months ago.”

If you want last-touch instead of first-touch, drop the if (Object.keys(fromUrl).length) guard and the cookie always overwrites. That’s a debate for marketing; we default to first-touch because it matches how every B2B team we’ve worked with thinks about leads.

4. The hidden-field capture for forms

When a form submits, you want the UTMs to land in the same row as the lead in your CRM (HubSpot, Salesforce, whatever). The cookie above gets read into a hidden field at submit time:

// src/components/blocks/ContactForm.astro — inside the submit handler
function readUtmCookie() {
  const match = document.cookie.match(/(?:^|;\s*)emdash_utm=([^;]+)/);
  if (!match) return {};
  try {
    return JSON.parse(decodeURIComponent(match[1]));
  } catch {
    return {};
  }
}

form.addEventListener('submit', async (e) => {
  e.preventDefault();
  const fd = new FormData(form);
  const utm = readUtmCookie();
  await fetch(formEndpoint, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      name: fd.get('name'),
      email: fd.get('email'),
      message: fd.get('message'),
      turnstileToken: fd.get('cf-turnstile-response'),
      utm,
    }),
  });
});

The Worker on the receiving side puts utm.utm_source, etc., into the email body and the D1 row. From there it’s the CRM’s job to surface it. HubSpot, for example, can read the utm keys directly off the form payload if you map them in the form integration.

5. The events that should fire

Inside GTM, configure these tags. Each one is a “tag” in GTM that fires on a “trigger.” We won’t walk through the GTM UI here; the tags themselves are obvious — the triggers are where teams get this wrong.

EventTagTrigger
Page viewGA4 page_view (auto)All Pages
Scroll depthGA4 scrollScroll Depth (25/50/75/90%)
Outbound clickGA4 clickClick - Just Links, where URL doesn’t contain your hostname
Form submitGA4 generate_leadForm Submit (gated to your contact form ID)
CTA clickGA4 select_contentClick - All Elements, where data-cta attribute is non-empty
Demo video playGA4 video_startVideo Progress (0%) on Loom/YouTube embeds

The CTA-click pattern is the one nobody runs and everyone should. We add a data-cta attribute to every button and link we want to track:

<a class="btn btn--primary" href="/contact" data-cta="cta-band-primary">
  Book intro call
</a>

The trigger fires on any element with a non-empty data-cta attribute. The tag passes the attribute value as the GA4 item_id. After two weeks of data, you can answer “which CTA copy converts best” with a query, not a hunch.

What changes when you migrate from WordPress

If you’re migrating an existing WordPress site to Astro + EmDash, three things change about analytics:

You delete every analytics plugin. Yoast, Site Kit, MonsterInsights, Independent Analytics — all of them go. Replaced by the GTM container alone.

You stop using the WordPress page slug as a “category.” GA4 reports already segment by URL path; you don’t need a custom dimension for “blog vs page.” If you had something like “Pillar / Cluster” categorization, encode it as data attributes on the body tag and read them via a GTM variable.

You probably gain accuracy. Most WordPress analytics setups we audit are double-counting because two plugins both load GA4. Or they’re under-counting because a caching plugin minified the GTM snippet and broke the data layer. Removing the plugins cleans the data.

We’ve yet to migrate a site where post-cutover GA4 numbers matched pre-cutover within 5%. They’re always cleaner after.

The plugin we ship

We bundle the four scripts above (GTM loader, noscript fallback, UTM script, hidden-field helper) into a small EmDash plugin. It registers as an EmDash plugin (sandboxed Worker isolate, declared permissions) and injects the right script tags into every page. Configure your GTM ID in the EmDash admin, ship.

For non-EmDash sites — Astro on Netlify, hand-rolled HTML, even WordPress — we ship the same logic as a copy-paste snippet. Same code, different distribution.

If you want it set up for you instead of DIYing, that’s the analytics setup engagement — typically a week, $750 if you’re already on EmDash, $1,500 standalone for a WordPress audit-and-rebuild.

Common mistakes

The five we see most often, in audits:

Loading both GA4 and GTM directly. Some teams load gtag.js and gtm.js. They double-count. Pick one — GTM is the right answer because it’s the supersets-everything choice. Remove gtag.js from your codebase.

Using dataLayer before the GTM script loads. If you push to dataLayer from inline scripts that run before GTM is ready, the events vanish. Define window.dataLayer = window.dataLayer || [] at the top of your site, before any GTM-emitting code.

Forgetting CSP connect-src. Tags load fine but beacons fail silently. Check the browser’s network tab for blocked POSTs to google-analytics.com after you deploy a CSP change.

Duplicate event names across plugins. When you keep WordPress’s MonsterInsights firing while migrating, both it and the new GTM container fire page_view. Pick one.

Not testing in GTM debug mode before publishing. Every GTM container has a “Preview” mode that shows you exactly what fires on each page. Use it for at least 15 minutes before clicking publish.

FAQ

Do I really need GTM, or can I just load gtag.js directly?

GTM is the right call. You get one CSP allowlist entry instead of seven, marketers can add LinkedIn or Meta pixels without opening a developer ticket, and consent management has a single choke point. Loading gtag.js alongside gtm.js is the most common audit finding, and it double-counts every page view (Google Tag Manager docs, 2026).

How long does GA4 data take to show up after the migration?

Real-time reports populate within minutes, but the standard reports (Acquisition, Engagement, Monetization) typically take 24–48 hours to backfill after a fresh property starts receiving traffic. Plan your post-cutover analytics review for day three at the earliest, and don’t trust week-one numbers for trend comparisons against the pre-migration baseline (GA4 data freshness docs, 2026).

Can I migrate Universal Analytics historical data into GA4?

No. Universal Analytics was sunset on July 1, 2024, and GA4 is a structurally different property — different schema, different event model, different identity stitching. Export the UA reports you care about to BigQuery or a CSV before Google removes access, and treat GA4 as a fresh baseline. There is no supported import path (Google Analytics UA sunset docs, 2026).

The emdash_utm cookie is first-party, set on your own domain with SameSite=Lax, and stores only marketing parameters from the inbound URL. For most jurisdictions that puts it in the “functional” or “analytics-with-consent” bucket rather than the strict-tracking category. EEA traffic still needs a consent banner gating GA4 itself; consult counsel for your specific case (MDN cookies docs, 2026).

Will my CSP block ad pixels like LinkedIn or Meta?

Yes, unless you allowlist them. Add px.ads.linkedin.com for LinkedIn Insight, connect.facebook.net for Meta Pixel, and the matching connect-src and img-src entries for each. The pattern is the same as the GA4 allowlist above — every new tag domain inside GTM needs a corresponding header entry, or beacons fail silently (MDN CSP docs, 2026).

What this saves

A typical WordPress site’s analytics setup is 4–8 plugins, each with a license, each requiring updates, each occasionally breaking. The Astro + EmDash equivalent is one GTM container plus four files of code that don’t change. We’ve migrated clients whose annual analytics-plugin renewals were $300–800 — that’s now $0, and the numbers are more accurate.

The next post — Vibecoding a marketing site — covers the AI-first developer workflow on top of this stack. If you’d rather skip the read and just have us run this, the analytics-setup engagement is below.

Need help applying any of this?

We do this for clients every week. 30 minutes, no obligation.