Next up we're adding our spam filter . This page talks you through that process.

Spam Filter

Where is this in the Pipeline Interface?

What are you setting up here?

How do I do it?

Example 1: Flagging traffic from specific referrers as spam

Example 2: Excluding traffic from specific locations as spam

How do I know which traffic is spam?

No screen resolution

# Pipeline GA4 Tables - Event Count Table

- **Table name:** `ga4_mrt_event_count`
- **Scoping**: Event level
- **Table Contains**: All standard dimensions with event counts for every event in your account, pivoted wide as separate columns
- **Each Row Presents**: One unique combination of all dimensions at event level
- **Optional Config Required**: Custom events will automatically appear as additional columns

This table takes all events from your GA4 property and pivots them into separate count columns. It's a great place to debug events or for custom reporting based on individual custom event setups.

## Dynamic Columns

Here's what makes this table special: **columns are added dynamically based on your events**. Every event that gets fired in your GA4 property will appear as its own count column. This means your table structure grows as you add new events to your tracking setup.

If you add new events to your GA4 property, they will automatically be added to this table as new columns.

## How Event Counts Work

Each session gets a row, and every event that occurred during that session gets its own count column. For example:
- `met__event_count_page_view` - How many page views in this session
- `met__event_count_purchase` - How many purchases in this session 
- `met__event_count_add_to_cart` - How many add to cart events in this session

Plus any custom events you've set up will appear as `met__event_count_{your_custom_event_name}`.

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_mrt_event_count.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the event count table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data.In this table, there are keys for specific sessions and client IDs." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__session_start_dt</code> as this is the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "Dimensions", preFilter: { column: "Category", value: "Dimensions" }, description: "These dimensions can be used to segment the data and view event counts across different user segments and traffic sources." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is the list of event count metrics. Remember, you'll only see columns for events that have been fired in your GA4 property." }
 ]}
/>

The Event Count Table pivots event counts wide, showing all events as separate columns. Let's jump in.

Pipeline GA4 Tables - Event Count Table

Event Count Table

# Pipeline GA4 Tables - Attribution Table

- **Table name:** `ga4_attribution`
- **Scoping**: Dependent on the Dimension selection.
- **Table Contains**: Metrics calculated with different types of attribution.
- **Each Row Presents**: One conversion with different attribution metrics.
- **Optional Config Required**: Key Events (previously known as Goals) which may be form submissions, purchases, or newsletter signups.

This table has one row per session with a conversion.

![attribution_table](/images/resources/documentation/ga4_tables/attribution_table/attribution_table.png)

As there are numerous ways of attributing which channel(s) are responsible for a conversion, this table has different conversion metrics depending on how you want to attribute the conversion to different channels/ sources.

Each conversion is then broken down by the different attribution models.

- First Click
- Last Click
- Linear
- Time Decay

The attribution window length is, by default, 90 days, but this can be adjusted.

<TableOfContents toc={toc}/>

NBED: Show link to FAQ.

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_attribution.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the attribution table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data.In this table, there are keys for specific sessions and client IDs." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__session_start_dt</code> as this is the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "Dimensions", preFilter: { column: "Category", value: "Dimensions" }, description: "These dimensions can be used to segment the data and view the different types of attribution across." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is the list of metrics and different ways of attributing conversions for key events." }
 ]}
/>

The Client Table is at Client (device-browser) level Scoping. Let's jump in.

Pipeline GA4 Tables - Client Table

Attribution Table

# Pipeline GA4 Tables - User ID Table

- **Table name:** `ga4_user_ids`
- **Scoping**: User ID Level
- **Table Contains**: User level dimensions and metrics.
- **Each Row Presents**: One row per User ID.
- **Optional Config Required**: Creating a unique ID for each user, which is typically linked to logged in users.

![user_id_one_row_per](/images/resources/documentation/ga4_tables/user_id_table/user_id_one_row_per.png)

This setup means you are able to track user behaviour across different sessions, devices and platforms unlike Client IDs which are just confined to unique device-browser pairings.

Still confused? See <OutboundLink href="/documentation/accessing-your-data/faqs/pipeline-ga4-faqs" children="this FAQ page" /> which explains the difference.

<TableOfContents toc={toc}/>

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_user_ids.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the user ID table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data. They allow GA4 to accurately capture and link user activities across different sessions, devices, and platforms.In this table there are unique identifiers for:<ul><li>User ID</li><li>User Ids combined with Client IDs</li><li>The Steam.</li></ul>" },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "The table contains several date fields based on when the user visited the site.These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__pt_dt</code> (Pt Dt) as this is the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "First Session Attribution", preFilter: { column: "Category", value: "Dimension - First Session Attribution" }, description: "This section provides information on the users first session on the site/app.First user dimensions are the most logical way of interacting with user metrics as this is the best way to understand how they came to the site initially.Example: By looking at the <code>dim__first_source_session</code> we can see the source for the first session for this specific user." },
 { name: "Last Non Direct Attribution", preFilter: { column: "Category", value: "Dimension - Last Non Direct Attribution" }, description: "This section provides information on the first user interaction with the site using the Last Non-Direct Click (LNDC) attribution method. This is the GA4 default attribution method.This method attributes the traffic acquisition to the last channel the user clicked through to the site over the past 90 day period (though you can customise this yourself), excluding direct visits.Example: By looking at the <code>dim__first_source_lndc</code> we can see the source for the first session for this specific user based on the last interaction that was not direct." },
 { name: "Device", preFilter: { column: "Category", value: "Dimension - Device" }, description: "This section provides information on the device which the user used to interact with the site/app on the first user session." },
 { name: "Landing Page", preFilter: { column: "Category", value: "Dimension - Landing Page" }, description: "This section provides information about the first Landing Page the user came across, or in other words, the first page that the user came to on the app/website to start the session." },
 { name: "Location Information", preFilter: { column: "Category", value: "Dimension - Location Information" }, description: "This section provides information on the location of the first user session based on the IP address." },
 { name: "Other", preFilter: { column: "Category", value: "Dimension - Other" }, description: "This is a combination of other miscellaneous dimensions." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is all the metrics which can be combined with the dimensions listed above.There are some metrics such as Users, Active Users and Returning Users, which will need to be calculated using formulas - as these need to be calculated at the final stage of data visualisation. Why is that? See this FAQ." }
 ]}
/>

The User ID Table is at User level Scoping. Let's jump in.

Pipeline GA4 Tables - User ID Table

User ID Table

Complete reference for all Pipeline GA4 tables - both default and optional - with unified navigation and filtering.

Pipeline GA4 Tables - All Tables

All Tables

# Pipeline GA4 Tables - Client Table

- **Table name:** `ga4_mrt_clients`
- **Scoping**: Client Level
- **Table Contains**: Client level dimensions & metrics.
- **Each Row Presents**: One row per client.


Each row of this table represents someone using your site on one day and information about what they did.

It's not a "true" user however, it's a unique browser + device. So if you access from a phone and laptop, that would be two rows.

We call this browser + device - a client. In GA4 this is a **user**, if you've done no additional set-up.

![client_id_each_row](/images/resources/documentation/ga4_tables/client_table/client_id_each_row.png)

Why don't we call it a user? 

<OutboundLink href="/documentation/accessing-your-data/faqs/pipeline-ga4-faqs" children="Please see here to see the difference between Client ID and User ID." />

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_client_keys.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the client table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data. In this table there are unique identifiers for clients and for the steam." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "The table contains several date fields based on when the user visited the site.These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__pt_dt</code> (Pt Dt) as this is the partition column.Use what what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "First Session Attribution", preFilter: { column: "Category", value: "Dimension - First Session Attribution" }, description: "These dimensions provide information about the users first session on the site/app. First user dimensions are the most logical way of interacting with user (client) metrics as this is the best way to understand how they came to the site initially. Example: By looking at the <code>dim__first_source_session</code> we can see the source for the first session for this specific user." },
 { name: "LNDC Attribution", preFilter: { column: "Category", value: "Dimension - LNDC Attribution" }, description: "These dimensions relate to the first user interaction with the site using the Last Non-Direct Click (LNDC) attribution method. This is the GA4 default attribution method. This method attributes the traffic acquisition to the last channel the user clicked through to the site over the past 90 day period (though you can customise this yourself), excluding direct visits. Example: By looking at the <code>dim__first_source_lndc</code> we can see the source for the first session for this specific user based on the last interaction that was not direct." },
 { name: "Traffic Source All Time", preFilter: { column: "Category", value: "Dimension - Traffic Source All Time" }, description: "These dimensions are provided directly by GA4 and provide information about the traffic source that first acquired the user." },
 { name: "Device", preFilter: { column: "Category", value: "Dimension - Device" }, description: "These dimensions provide information about the device which the user used to interact with the site/app on their first user session." },
 { name: "Landing Page", preFilter: { column: "Category", value: "Dimension - Landing Page" }, description: "These dimensions relate to the first Landing Page the user came across, or in other words, the first page that the user came to on the app/website to start the session." },
 { name: "Location Information", preFilter: { column: "Category", value: "Dimension - Location Information" }, description: "These dimensions relate to the location of the first user session based on the IP address." },
 { name: "Paid Dimensions", preFilter: { column: "Category", value: "Paid Dimensions" }, description: "These dimensions relate to the paid traffic sources. These can be broken down into: <ul><li>First Daily - These are the first paid interactions with the site by a user on a given day.</li><li>Last Non Direct - These are the last paid interactions with the site by a user on a given day.</li></ul>" },
 { name: "Other", preFilter: { column: "Category", value: "Dimension - Other" }, description: "This is a combination of other miscellaneous dimensions." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is all the metrics which can be combined with the dimensions listed above. There are some metrics such as Users, Active Users and Returning Users, which will need to be calculated using formulas - as these need to be calculated at the final stage of data visualisation. Why is that? <a href=\"/documentation/accessing-your-data/the-fundamentals/accessing-your-data/\" target=\"_blank\" rel=\"noopener noreferrer\">See the FAQ on this page</a>" }
 ]}
/>

Client Table

# Pipeline GA4 Tables - Items Table

- **Table name:** `ga4_mrt_items`
- **Scoping**: Item Level (This is only for eCommerce websites).
- **Table Contains**: Item level dimensions and metrics.
- **Each Row Presents**: One row per item

When e-commerce actions take place such as purchases or viewing product details, GA4 creates "events".

Within these eCommerce events, there are specific pieces of information at Item level. This includes information such as item name, category, price and quantity.

Each row of this table represents a different item, and the details associated with it.

![items_table_each_row](/images/resources/documentation/ga4_tables/items_table/items_table_each_row.png)

<TableOfContents toc={toc}/>

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_items.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the items table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data.In this table there are identifiers for different sessions, clients and streams which can allow you to dig into specific sessions or clients etc." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__session_start_dt</code> as this is the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">Please see here to see the FAQ on this section.</a>" },
 { name: "Session Level", preFilter: { column: "Category", value: "Dimension - Session Level" }, description: "These dimensions relate to the source of session traffic i.e how users came to start their session." },
 { name: "Session Information", preFilter: { column: "Category", value: "Dimensions - Session Information" }, description: "These dimensions provide boolean (true or false) values depending on whether the session matches certain criteria." },
 { name: "Last Non Direct Click", preFilter: { column: "Category", value: "Dimension - Last Non Direct Click" }, description: "These dimensions provide information on the session using the Last Non-Direct Click (LNDC) attribution method. This is the GA4 default attribution method.This method attributes the traffic acquisition to the last channel the user clicked through to the site over the past 90 day period (though this can be customised), excluding direct visits." },
 { name: "Traffic Source All Time", preFilter: { column: "Category", value: "Dimension - Traffic Source All Time" }, description: "These dimensions are provided directly by GA4 and provide information about the traffic source that first acquired the user." },
 { name: "Item Level", preFilter: { column: "Category", value: "Dimension - Item" }, description: "These dimensions provide information at the item level which are sent as parameters with an event.Note: The item level parameters will need to be set up - these are not automatically generated by GA4." },
 { name: "Device", preFilter: { column: "Category", value: "Dimension - Device" }, description: "These dimensions provide information about the device which the user used to interact with the site/app." },
 { name: "Landing Page", preFilter: { column: "Category", value: "Dimension - Landing Page" }, description: "These dimensions relate to the first Landing Page the user came across, or in other words, the first page that the user came to on the app/website to start the session." },
 { name: "LNDC", preFilter: { column: "Category", value: "Dimension - LNDC" }, description: "These dimensions relate to the last non direct click (LNDC) attribution method." },
 { name: "Location Information", preFilter: { column: "Category", value: "Dimension - Location Information" }, description: "These dimensions relate to the location of the user session based on the IP address." },
 { name: "Other", preFilter: { column: "Category", value: "Dimension - Other" }, description: "This section provides information on other miscellaneous dimensions." },
 { name: "Session Level Traffic Source", preFilter: { column: "Category", value: "Dimension - Session Level Traffic Source" }, description: "These dimensions provide information about the traffic source that started the session." },
 { name: "Paid Dimensions", preFilter: { column: "Category", value: "Paid Dimensions" }, description: "These dimensions relate to the paid traffic sources and are at session level." },
 { name: "Metrics", description: "This section provides all the metrics which can be combined with the dimensions listed above." }
 ]}
/>

The Items Table is item level Scoping. Let's jump in.

Pipeline GA4 Tables - Items Table

Items Table

# Pipeline GA4 Tables - Pageviews Table

- **Table names:** `ga4_pages_daily` , `ga4_pages_hourly`
- **Scoping**: Event Level
- **Table Contains**: Event level dimensions & metrics.
- **Each Row Presents**: One row per Pageview which has matching dimensions (i.e device, location, browser etc).

There are two tables that are broken down over different time periods - either daily or hourly.

Each row of this table represents a pageview where there are matching dimensions.

For example, where the device, location and other information is the same.

![pageviews_table_each_row](/images/resources/documentation/ga4_tables/page_views_table/pageviews_table_each_row.png)

<TableOfContents toc={toc}/>

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_pages_daily.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the pageviews table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data.In this table there are unique identifiers for pages, sessions, sessions linked with pages, sessions, clients and streams." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "The table contains a single date period which is <code>date__pt_dt</code> (Pt Dt) which is the date that the pageview took place. This is also the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "Last Non Direct Attribution", preFilter: { column: "Category", value: "Dimension - LNDC" }, description: "These dimension look at the traffic source of the pageview using the Last Non-Direct Click (LNDC) attribution method. This is the GA4 default attribution method.This method attributes the traffic acquisition to the last channel the user clicked through to the site over the past 90 day period (though this case be customised), excluding direct visits." },
 { name: "Device", preFilter: { column: "Category", value: "Dimension - Device" }, description: "These dimensions provide information on the device used for the pageview." },
 { name: "Location Information", preFilter: { column: "Category", value: "Dimension - Location Information" }, description: "These dimensions provide information on the location of the pageviews based on the IP address." },
 { name: "Page Information", preFilter: { column: "Category", value: "Dimension - Page Information" }, description: "These dimensions provide information on the page that was viewed." },
 { name: "Other", preFilter: { column: "Category", value: "Dimension - Other" }, description: "This is a combination of other miscellaneous dimensions." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is all the metrics which can be combined with the dimensions listed above." }
 ]}
/>

The Pageviews Tables are at Event level Scoping. Let's jump in.

Pipeline GA4 Tables - Pageviews Table

Pageviews Tables

# Pipeline GA4 Tables - Session Table

- **Table name:** `ga4_mrt_sessions`
- **Scoping**: Session Level
- **Table Contains**: Session level dimensions and metrics.
- **Each Row Presents**: One row per session.

This table contains session data.

Each row is one session and the information about it such as when it started, the first landing page and how many pages were viewed.

![session_table_each_row](/images/resources/documentation/ga4_tables/session_table/session_table_each_row.png)

<TableOfContents toc={toc}/>

## Table Fields by Category

<TabbedTableFilterable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_sessions.csv" 
 defaultTab="Full list of all fields"
 tabs={[
 { name: "Full list of all fields", description: "Complete list of all available fields in the session table." },
 { name: "Keys", preFilter: { column: "Category", value: "Key" }, description: "Keys are unique IDs that help track and organise data.In this table there are unique identifiers for specific sessions, clients (users) and Streams." },
 { name: "Dates", preFilter: { column: "Category", value: "Date" }, description: "These date fields come in different formats - both timestamp and date.If you are unsure of which column to use, use <code>date__session_start_dt</code> as this is the partition column.Unsure what a parition column is? <a href=\"/documentation/accessing-your-data/faqs/pipeline-ga4-faqs\" target=\"_blank\" rel=\"noopener noreferrer\">See this section.</a>" },
 { name: "Session Level Traffic Source", preFilter: { column: "Category", value: "Dimension - Session Level Traffic Source" }, description: "The table contains several date fields based on when the session.These dimensions relate to the source of session traffic i.e how users came to start their session." },
 { name: "Last Non Direct Click", preFilter: { column: "Category", value: "Dimension - Last Non Direct Click" }, description: "These dimensions provide information on the session using the Last Non-Direct Click (LNDC) attribution method. This is the GA4 default attribution method.This method attributes the traffic acquisition to the last channel the user clicked through to the site over the past 90 day period (though this can be customised), excluding direct visits." },
 { name: "Traffic Source All Time", preFilter: { column: "Category", value: "Dimension - Traffic Source All Time" }, description: "These dimensions are provided directly by GA4 and provide information about the traffic source that first acquired the user." },
 { name: "Session Information", preFilter: { column: "Category", value: "Dimension - Session Information" }, description: "These dimensions provide boolean (true or false) values depending on whether the session matches certain criteria." },
 { name: "Device", preFilter: { column: "Category", value: "Dimension - Device" }, description: "These dimensions provide information on the device used for the session." },
 { name: "Landing Page", preFilter: { column: "Category", value: "Dimension - Landing Page" }, description: "These dimensions provide information on the Landing Page, or in other words, the first page that the user came to on the app/website to start the session." },
 { name: "Location Information", preFilter: { column: "Category", value: "Dimension - Location Information" }, description: "These dimensions relate to the location of the session based on the IP address." },
 { name: "Paid Dimensions", preFilter: { column: "Category", value: "Paid Dimensions" }, description: "These can be broken down into:<ul><li>Session Level - These is the traffic source information related to the start of the session.</li><li>Last Non Direct - These are the last paid interactions with the site for the session.</li></ul>" },
 { name: "Other", preFilter: { column: "Category", value: "Dimension - Other" }, description: "This is a combination of other miscellaneous dimensions." },
 { name: "Metrics", preFilter: { column: "Category", value: "Metric" }, description: "This is all the metrics which can be combined with the dimensions listed above.There are some metrics such as Sessions and Engaged Sessions, which will need to be calculated using formulas - as these need to be calculated at the final stage of data visualisation.Why is that? <a href=\"/documentation/accessing-your-data/the-fundamentals/accessing-your-data/\" target=\"_blank\" rel=\"noopener noreferrer\">See the FAQ on this page</a>" }
 ]}
/>

The Sessions Table is at Session level Scoping. Let's jump in.

Pipeline GA4 Tables - Session Table

Session Table

# Pipeline GA4 - FAQs

<Accordion header="What is a session?">
 When someone visits a site, the actions they take during that visit are grouped into a “session”. 
This session includes everything they do, such as viewing different pages, making a purchase or clicking on specific links.

This session starts when some either view a page on your site or opens your app.

A session comes to an end after 30 minutes of inactivity. Though the session timeout period can be adjusted if desired. There is no limit to how long a session can last.
</Accordion>


<Accordion header="What is a pageview?">
A pageview is when someone looks at a page on a website. Technically, an "event" fires every time a person visits a page.

A pageview is not the same as a session. A pageview happens each time someone looks at a page. For example, one session could have 4 pageviews if the person looks at 4 different pages.


![pageviews_explainer](/images/resources/documentation/faqs/pageviews_explainer.png)

</Accordion>


<Accordion header="What is a client ID (pseudo_user_id)?">
Anyone visiting a site is given a “Client ID” which is automatically created by GA4 which is described as a “User” in the GA4 interface.

It’s not a “true” user however,  it’s a unique browser + device. So if you access from a phone and laptop, that would be two separate Client IDs.

We call this browser + device  - a client.

In GA4 this is a  user, if you’ve done no additional set-up.

</Accordion>

<Accordion header="What is a User ID (user_id)?">
It’s possible to set up User IDs and link them to GA4.

User IDs are unique numbers or names given to users when they log into your website. Linking these IDs to GA4 helps track users across different devices and browsers.

This means we can follow a user's actions on a site, even if they switch devices or browsers. It helps us understand how users behave across different sessions and platforms.
</Accordion>


<Accordion header="What is the difference between a Client ID (pseudo_user_id) and a User ID (user_id)?">
Client IDs are automatically generated by GA4 based on a unique browser-device pair. These are described as Users in the GA4.

These are not strictly users, they are unique visits to the site from a browser-device pair.

User IDs, though, are unique identifiers for a user, which are usually obtained from a login system on a site and require specific configuration. These can be sent to GA4 - and allow for a more comprehensive view of how users are interacting across different devices, browsers and sessions.
</Accordion>


<Accordion header="What are UTM parameters?">
UTM Tracking code (also known as UTM parameters) are tags added to the end of a URL to help track where traffic is coming from for specific campaigns, and also be able to find out more information about how the user visited the site.

![utm_parameters_1](/images/resources/documentation/faqs/utm_parameters_1.png)

These parameters do require custom setup such as through using a [URL Builder](https://ga-dev-tools.google/ga4/campaign-url-builder/play/) and allow you to provide additional information such as the type of content, campaign or even source/ medium.

These UTM parameters populate fields such as:

- **`Dim__term_session`** - The paid search term that drove the user to the site.
- **`Dim__campaign_session`** - The campaign that brought the user to your site.
- **`Dim__content_session`** - The type of content that brought the user to your site

See some examples below:

![utm_parameters_2](/images/resources/documentation/faqs/utm_parameters_2.png)

</Accordion>

<Accordion header="What are the different types of attribution?">
There are numerous ways of attributing which channel(s) are responsible for a conversion, the `ga4_attribution` table has different conversion metrics depending on how you want to attribute the conversion to different channels/ sources.

Each conversion (Key event) is attributed using different attribution methods.

**First Click Attribution**

This attribution method gives full weight of the conversion to the first touchpoint on the conversion path.

![utm_parameters_2](/images/resources/documentation/faqs/first_click_attribution.png)

**Last Click Attribution**

This attribution method gives full weight of the conversion to the last touchpoint on the conversion path.

![utm_parameters_2](/images/resources/documentation/faqs/last_click_attribution.png)

**Linear Attribution**

This attribution method gives equal weight of the conversion to the last touchpoint on the conversion path.

![utm_parameters_2](/images/resources/documentation/faqs/linear_attribution.png)

**Time Decay Attribution**

This attribution method gives equal weight of the conversion to the last touchpoint on the conversion path.

![utm_parameters_2](/images/resources/documentation/faqs/time_decay.png)

</Accordion>

<Accordion header="What is a valuable session?">
A valuable session is a session where a user takes desired actions that are important for your business.

This could be viewing key pages (such as product or solutions pages), interacting with 3 or more pages on your website or viewing specific content. These sessions are tracked based on what matters most to you.

These valuable sessions are totally customisable depending on what is important to your business.

</Accordion>

<Accordion header="What are partition columns?">
You will hear us refer to partition columns across the different Pipeline GA4 tables. 

A partitioned table is split into smaller parts, or "partitions” which makes managing and querying the data faster (and more affordable!).

In the Pipeline tables, the partitioned column is often date (`date__pt_dt`).

This is the best date column to use in the majority of cases.

![partition_column](/images/resources/documentation/faqs/partition_column.png)

</Accordion>

Let's answer some of the common FAQs around Pipeline GA4

Pipeline GA4 - FAQs

# Piped Out Specific Metrics

We try to stay as close to standard GA4 naming conventions as possible, however occasionally we have some which are slightly different.

This section will explain any differences we have with the default set-up.

<Accordion header="Keys">
 Keys are used to represent uniqueness.

**Example key fields:**

- `key__session`

- `key__client`

The **`session_key`** field, for example, is a single value that uniquely identifies a session.

Similarly, the **`key__client`** field represents a unique Client ID (i.e unique device-browser pairing).

This doesn’t apply to pages, where `key__page_daily`, represents a set of unique dimensions rather than a single pageview.
</Accordion>


<Accordion header="LNDC vs Sessions">
There are a handful of dimensions that you might not immediately recognise.

**`{traffic_source}_lndc`** 	e.g. `source_lndc`

**`{traffic_source}_session`**	e.g. `source_session`

These summarise the different types of attribution.

Typically, you will want to use LNDC, which stands for **Last Non Direct Click attribution**.

LNDC is the default in GA4. i.e. the source of the session. If it’s direct, it will look back for up to 90 days to get a “non-direct” attribution source - though this lookback period can be adjusted in <OutboundLink href="/src/content/documentation/setup-instruction/pipeline-ga4-setup/advanced-configuration" children="this section" />
 

The other attribution is **Session level attribution**. This has no lookback and takes only the first provided source.
</Accordion>

<Accordion header="Entrances & Exits">
**There are several metrics here:**

- `met__entrances`
- `met__entrances_calc`
- `met__exits_calc`

**So what is the difference between `met__entrances` and `met__entrances_calc`?**

The calc values are calculating the entrance and exit pages for a session because Google didn’t originally calculate you for them.

Google now gives you a metric for entrances, so we have entrances and our metric has been renamed to `met__entrances_calc`.

**Which should I use?**

We have found some accounts seem to have bugs with Google's default entrances metric and our calc values are closer to UI numbers, so we'd recommend using those.

As always if you do have questions about any metric or quantities and want more than we've covered here, please reach out to support.
</Accordion>

This page talks through a few of the specific Piped Out metrics. Let's walk through them.

Piped Out Specific Metrics

# Understanding Field Names

There are several ways to access the GA4 data:

1. On BigQuery, where you are able to query the data directly
2. Via the Pipeline GA4 Looker Studio Connector.

## **Field names are different depending on where you access the GA4 data.**

In BigQuery the field names contain prefixes describing the type of underlying data they are.

In Looker Studio you get the more nicely formatter, less over-explained version.

They both have their pros and cons!

## Explaining field names

### BigQuery Field Names

When accessing the data via BigQuery directly the field names start with different characters depending on what they are:

- **`dim__`** - This field is a dimension (i.e `dim__landing_page__full_domain`)
- **`met__`** - This field is a metric (i.e `met_sessions`)
- **`met_ce__` -** This field as a conversion metric (i.e `met_ce__visits_pricing_sum`)
- **`key__` -** This field is a key (i.e `key__session`)
- **`dim_sp__`** - This field is a custom session dimension.
- **`met_sp__`** - This field is a custom session metric.
- **`dim_up__`** - This field is a custom user dimension.
- **`met_up__`** - This field is a custom user metric.

All of these last 4 are the session and user properties you’ve defined in the interface.

### Looker Studio Field Names

In Looker Studio the fields are the exact same as in BigQuery, but with more user friendly names. For example, `met__event_count` is simply `Event Count`.

This is to save you having to rename all the fields.

Here's an example of the BigQuery Name and the Looker Studio Names side by side. You can see the full lists of the schema in each table in the Default Tables section.


<MetricsTable 
 csvPath="/data/GA4 - Table Names & Description - Updated ga4_client_keys.csv" 
 maxHeight="max-h-96"
 preFilter = {{column: "Category", value: "Key"}}
/>

The Pipeline GA4 data has a lot of fields. Let's walk through them.

Understanding Field Names

# Understanding Tables
## Types of Tables
We have several types of GA4 data, organized into three key categories:
### Output Tables
- These are the primary tables you’ll use for analysis and reporting.
- These tables begin with **`ga4_mrt`**.
### Lookup Tables
- These tables let us map one value to another (e.g., `Campaign1` → `campaign_1`).
- These tables begin with **`ga4_lookup`**.
### Staging Tables
- These are intermediary tables used to build the output tables.
- You’ll likely never need to interact with these directly.
- These tables begin with **`ga4_base`** or **`ga4_stg`**.

## Which table do I choose?

This depends on the scope of what you want to look at.

If you want to look at sessions use the session table. 

If you want to look at Users, then use the Client/ User table. 

Mixing dimensions & metrics from different scopes can cause double counting so it’s worth checking the <OutboundLink href="/documentation/accessing-your-data/the-fundamentals/scoping-which-table-to-pick" children="GA4 scoping" /> page to make sure you’re not doing that. 

The most obvious thing to avoid is using session dimensions with user metrics.

## The full list of tables can be found below:

### Default Output Tables

These tables are included by default as they are automatically generated by GA4.

These tables are produced at the different GA4 scoping levels. |

| **Table Name** | **Type** | **Default or Optional** | **What does it do?** |
| --- | --- | --- | --- |
| `ga4_mrt_clients` | output | Default | Client level data (i.e. device + browser) |
| `ga4_mrt_items` | output | Default | Ecommerce item level data |
| `ga4_mrt_pages_daily` | output | Default | Page level data (daily) |
| `ga4_mrt_pages_hourly` | output | Default | Page level data (hourly) |
| `ga4_mrt_sessions` | output | Default | Session level data. |
| `ga4_mrt_attribution` | output | Optional | Attribution models for conversions |
| `ga4_mrt_users` | output | Optional | User level data. (optional) |

### Lookup Tables

| **Table Name** | **Type** | **What does it do?** |
| --- | --- | --- |
| `ga4_lookup_base_country` | lookup | A lookup file for country labels. |
| `ga4_lookup_seed_source_categories` | lookup | A lookup file for channel definitions |

### Staging Tables

| **Table Name** | **Type** | **What does it do?** |
| --- | --- | --- |
| `ga4_stg_base_events` | staging | Contains events and information about them. |
| `ga4_stg_event_purchase` | staging | Contains the information about purchase events. |
| `ga4_stg_events` | staging | Contains events and information about them. |
| `ga4_stg_page_engaged_time` | staging | Contains engagement time information for unique pages/ sessions combinations. |
| `ga4_stg_page_entrance_exits` | staging | Contains the keys for unique entrances and exits. |
| `ga4_stg_processed_sessions` | staging | Contains unique Session Keys. |
| `ga4_stg_sessions_dimensions` | staging | Contains dimensions for each unique Session. |
| `ga4_stg_sessions_dimensions_lndc` | staging | Contains the Last Non Direct traffic source information for a session. |
| `ga4_stg_user_lndc_session_source` | staging | Contains the Last Non Direct traffic source information for a User. |
| `ga4_stg_user_properties_store` | staging | Contains the user level property information (i.e whether they match any of the user properties). |


## FAQs

<Accordion header="What are partition columns?">
You will hear us refer to partition columns across the different Pipeline GA4 tables. 

A partitioned table is split into smaller parts, or "partitions” which makes managing and querying the data faster (and more affordable).

In the Pipeline tables, the partitioned column is often date (`date__pt_dt`).

This is the best date column to use in the majority of cases.


![partition_columns](/images/resources/documentation/understanding_tables/partition_columns.png)
</Accordion>

This page talks through the different Pipeline GA4 tables. Let's walk through them.

Understanding Tables

# How do access my GA4 data?

Once you have set up your GA4 pipeline, there are several ways of accessing the data.

1. Looker Studio Connector
2. BigQuery Tables
3. The default GA4 Looker Studio Report

### Which one should I pick to access my data?

The data behind both the Looker Studio Connector & BigQuery is the same (see the diagram below).

- If you’re a beginner, Looker Studio is easier to use. 
- If you like your SQL or you’re looking to do more complex stuff, BigQuery is probably better.


![how_does_ga4_work](/images/resources/documentation/accessing_your_data/how_does_ga4_work.png)

### You should pick the Pipeline Looker Studio connector when:

1. You want friendly names: The field names in Looker Studio are more user-friendly. Instead of seeing a field name like **`dim__device_category`**, you can see it as **`Device Category`**.
2. You want to do less customisation/work: Looker Studio has **pre-calculated fields** not found in our raw tables. Many metrics (i.e. Session Count) can only be calculated at the final stage (e.g. a Looker Studio/BI tool) if they’re going to be accurate.
We do provide the formulas if you’re looking to add them to a BI tool yourself, but it will involve more manual work. 

### You should pick the BigQuery tables when:


1. You want to do more customisation: The key thing is that you’ve got a raw table to work with, there’s no intermediate steps happening you can’t see and you can do whatever you’d like with it.
2. Build in a BI tool outside of Looker Studio: If you want to build in a BI tool outside of Looker Studio e.g. Looker, Tableau, etc. you’ll have to use this option.

To give some examples of 1. The most common things we see people doing with this are:

1. Changing the date logic (e.g. Adding pre-filtered time periods like quarters)
2. Combining the GA4 tables with other data sources (e.g. a CRM, Google Ads, Search Console etc).

You can still generate the same metrics that are in the Looker Studio connector. 

We provide the formulas in the tables which can be found in the navigation.

## FAQS - Accessing Your Data

<Accordion header="Is the data provided in the “Pipeline GA4 Looker Studio Connector” the same as “Google's default Looker Studio GA4 connector”?">
 In short, no.

The Pipeline Looker Studio connector is built off your GA4 BigQuery data that has been processed.

You’re getting access to all of our processed data and the features that come with it.

The default connector uses the API. It will have a similar set of restrictions to the actual GA4 UI, sampling, storage limits, flexibility etc.
</Accordion>


<Accordion header="Why can some metrics only be calculated at the final stage?">
Metrics such as "Sessions Count" in the Sessions Table and "User Count" in the Users Table can only be calculated at the final stage because they rely on filtering and re-aggregating data. 

These calculations are only accurate after the filtering and re-aggregation is complete.

**Example:** 

Let’s say that we want to count the number of unique users who visited the site over a 3 day period.

This is how our users visited the site.

![user_example](/images/resources/documentation/accessing_your_data/user_example.png)

Depending on how we count users we can get different numbers.

- Unique users over the whole period: 4
- Unique users by day:
 - Day 1: 2
 - Day 2: 1
 - Day 3: 2

If we add up the second, we get 5. We double counted a user. 

When we pre-group/filter data we can end up with incorrect data because of this type of scenario. In order to avoid this we have to calculate these metrics in the BI tool.

</Accordion>

So you've done the setup, let's talk about accessing your data. Let's walk through it.

Accessing your data

1. Accessing your data

# Using the Looker Studio Connector

The Looker Studio Connector is the easiest way to access your Pipeline BigQuery GA4 data.

If you've come here from Looker Studio directly and don't know what Pipeline is, you'll probably want to start here:

- [Introduction to Pipeline](/documentation/getting-started/intro/)

But a quick reminder it only works with GA4 BigQuery data processed through Pipeline.

If you already know all that and want to signup/login you can do it <OutboundLink href="https://line.pipedout.com" children="here" />, where you'll also find our <OutboundLink href="https://line.pipedout.com/legal/terms/" children="terms" />, <OutboundLink href="https://line.pipedout.com/legal/privacy/" children="privacy policy" /> and all the legalese.

## Where does the data from the Pipeline GA4 Connector come from?

All the data comes from the tables we build in the pipeline.

The Looker Studio Connector is doing two things:

1. It takes the data from the pre-processed tables. 
2. It adds some formulas to calculate metrics that can only be calculated in a BI tool.

## How do I use it?
Let’s break this down step by step.

Firstly - you will need to have set up your Pipeline.

If you’ve not done that, head to <OutboundLink href="/documentation/setup-instructions/creating-your-pipeline/connecting-ga4-to-bigquery/" children=" this section to start creating your Pipeline." />

### 1. Create a data source in Looker Studio

First step is creating a data source in Looker Studio.

You can use this link to create a data source: <OutboundLink href="https://datastudio.google.com/datasources/create?connectorId=AKfycbwtQR9revXcgO-AOydiUGPlhnsEmaWaarPJEDrCVIlaYt4AGlJ9DvmMnNPSDQk7Z-wtYA" children=" Pipeline Looker Studio Conenctor" />

### 2. Select your Dataset & Project

Select your BigQuery project. 

This will will then show you the dataset for any pipelines you have setup.

<Callout type="info">

It will only show datasets where the full GA4 set-up has been completed.
If you’re missing a dataset and you’ve completed the run please contact support.

</Callout>

![select_dataset_and_project](/images/resources/documentation/using_looker_studio_connector/select_dataset_and_project.png)

### 3. Select the Table

You will then have the option to select which table you want to produce.

<Callout type="info">

**How do I know which table to pick?**

Pick the scope you want to work at. 

Do you want to know:

- Where do your users come from? > User Scope
- Where do your sessions come from > Session Scope
- Which page gets the most visits from a country > Hit scope

**Important Note**: Not all metrics are compatible with all dimensions. 

We discuss this more on the GA4 scoping page.

</Callout>

![select_table](/images/resources/documentation/using_looker_studio_connector/select_table.png)

This will be restricted to those that actually exist.

If you do not have User ID set up, then the Users table will not be available.

<Callout type="info">
**Note**: If you change table when you edit this you might need to reselect it.
</Callout>

For more about the different tables, see this section.

![select_from_these_tables](/images/resources/documentation/using_looker_studio_connector/select_from_these_tables.png)

### You will then need to select Primary Schema.

There is only one option - and you need to select it (or the Connector will not work).

This is a bit of a quirk of how Looker Studio Connectors & BigQuery interact.

![select_primary_schema](/images/resources/documentation/using_looker_studio_connector/select_primary_schema.png)

### Select the Time Period

You can select single period. Pick this 90% of the time.

You’ll be able to create graphs and use the default Looker Studio period comparison. 

If you want access to the individual period comparison fields so you can do more complex % calculations you’ll need to use period comparison.

![select_time_period](/images/resources/documentation/using_looker_studio_connector/select_time_period.png)

### [Optional] Select the Metrics & Dimension Parameters

This is optional. 

If you are accessing the Looker Studio dashboard for the first time, you can skip this and when you’re familiar with the data, you can come back to this. 

These parameters are linked to dimensions which can be used in graphs and tables.

<Callout type="info">
**Important note**: You will need to tick the checkboxes for these metrics & dimensions if you want to use the Looker Studio Templates.
</Callout>

![select_metric_and_dimension_parameters](/images/resources/documentation/using_looker_studio_connector/select_metric_and_dimension_parameters.png)

### Select the Optional Metrics & Dimension Parameters

From then you will be able to Connect or “Reconnect” in the top right hand corner.

![reconnect](/images/resources/documentation/using_looker_studio_connector/reconnect.png)

## Using the Looker Studio Connector

You’ve set up your Looker Studio Connector. Let’s start using it. 

There are a couple of things to bear in mind when in the Looker Studio interface.

### *Date Range Dimensions*

You’ll see a number of different date range dimensions.

Where Looker Studio asks for a date field (e.g. in a Time Series Graph), you should use “Date.”

![use_this_for_date_range](/images/resources/documentation/using_looker_studio_connector/use_this_for_date_range.png)

The others do more specific things and should be used just as dimensions.

### *There are additional fields in Looker Studio that are not in BigQuery*

Some fields can only be calculated at the final stage of data visualisation. 

This is because filters or aggregations can change these numbers and cause unexpected results.

**Example:** Let’s say that we’re filtering to the number of users over a 3 day period.

The user table contains a single row per user.

You have to look over the whole of that 3 day period and count the number of unique users. 

If you tried counting the number of users per day and adding that up, you’d be double counting a lot of users.

As a result, some fields are only available through the Looker Studio connector.

This includes:

- Users Counts in the Client Table
- Sessions Count in the Session Table.
- Conversions rates and Average Order Value for eCommerce.

See the below example.

![user_visits_over_a_3_day_period](/images/resources/documentation/using_looker_studio_connector/user_visits_over_a_3_day_period.png)

<Accordion header="Which users metrics are we referring to in the different tables?">

The beady eyed of you may notice that there are actually two potential "users": 

1. Pseudo Users / Non-logged In Users
2. User ID / Logged in Users

Within the documentation, we refer to these as:

- Non logged users: **Clients**
- Logged in Users: **User IDs**

Both of these are described in the GA4 interface as users. 

So how do I know whether we’re referring to Clients or User IDs across the different tables?

By default, we use Clients as users across all tables by default except the User ID table. 

If you’ve setup User IDs then they are your users in the User ID table. 

If you want to set up User ID as users across the other tables, then you will need to add in/update your own custom fields.

To see the formulas for users, see the field names for the different tables. 

These can be found here: 

- <OutboundLink href="/documentation/accessing-your-data/default-tables/client-table/" children="Client Table" />
- <OutboundLink href="/documentation/accessing-your-data/default-tables/items-table/" children="Items Table" />
- <OutboundLink href="/documentation/accessing-your-data/default-tables/session-table/" children="Session Table" />
- <OutboundLink href="/documentation/accessing-your-data/default-tables/pageviews-tables/" children="Pageviews Table" />

- <OutboundLink href="/documentation/accessing-your-data/optional-tables/attribution-table/" children="Attribution Table" />
- <OutboundLink href="/documentation/accessing-your-data/optional-tables/user-id-table/" children="User ID Table" />

</Accordion>

<Accordion header="So how are the different types of users tracked?">

Let’s start with Clients…

Clients aren't actual users, but are **unique device-browser pairings** which are used to act as a proxy for users. This is your default “Users.”

So, if someone visits your site from their phone and later from their laptop, that counts as two different "Users," even though it’s the same person behind both devices.

So what about User IDs?

User ID offers a more accurate way to track individual users.

An ID is assigned to each user, usually through a login, and follows their activity across numerous sessions, devices, and browsers.

So if a logged in user accessed the site on their laptop in the morning and then on their phone in the afternoon, this is one single `user_id`.

</Accordion>

The Pipeline GA4 Looker Studio Connector is an easy way to access your data. Let's walk through it.

Using the Looker Studio Connector

Using the Looker Studio connector

# Accessing data via BigQuery

Once you are all setup, your processed data will live in BigQuery.

The primary tables you’ll be interacting with are the **Output Tables**.

### There are a number of GA4 tables generated they broadly group into:

- **Lookup tables** - These let us map one value to another
- **Staging tables** - These are tables we use to build the output tables.
- **Output tables** - There are the primary tables you’ll use for analysis and reporting.

### How do I see these tables?

**Step 1**: Go to your Pipeline, select the specific Pipeline you want to view

Navigate to <OutboundLink href="https://line.pipedout.com/pipelines" children="List Pipelines" /> and from there you can select your Pipeline. 

![select_pipeline](/images/resources/documentation/accessing_data_via_bigquery/select_pipeline.png)

Then scroll down to Pipeline Actions and then "Select View in BigQuery."

![select_view_in_BigQuery](/images/resources/documentation/accessing_data_via_bigquery/select_view_in_BigQuery.png)

Alternatively, you can go directly to <OutboundLink href="https://console.cloud.google.com/bigquery?" children="BigQuery Console" />.

**Step 2**:  Go to Explorer on the left hand side and select the appropriate folder path.

![Explorer](/images/resources/documentation/accessing_data_via_bigquery/Explorer.png)


**Step 3**: Search for the different output tables, which all start with ga4_mrt

![ga4_mrt](/images/resources/documentation/accessing_data_via_bigquery/ga4_mrt.png)

## Example Tables

**The key tables are:**
- **`ga4_mrt_clients`** – Client-level data (e.g., device + browser)
- **`ga4_mrt_sessions`** – Session-level data
- **`ga4_mrt_pages_daily`** – Page-level data 

**Optionally:**

- **`ga4_mrt_items`** – Ecommerce item-level data
- **`ga4_mrt_users`** – User-level data (only populated if User ID is set up)
- **`ga4_mrt_attribution`** – Attribution models for conversions

So you've done the setup, let's talk about accessing your data via BigQuery. Let's walk through it.

Accessing data via BigQuery

Using BigQuery

# Which table should you pick?

Pipeline will create a minimum of 3 tables:

- Sessions 
- Clients
- Pageviews

Which should you use?

**The 5 second answer is:**

Use the sessions table. 

You probably want this one. Particularly if you were used to Universal Analytics.

**The longer answer is:**

Pick the scope you want to work with and pick the table for that scope. 

We'll spend the rest of this post explaining that.

**Why does it matter?**

You can't put *any* metric with *any* dimension. 

This isn't just a Pipeline thing, it's an analytics thing, it's a GA4 thing. Many people get this wrong, report on the wrong numbers and never even realise.

Let's get into it. 

<TableOfContents toc={toc} />

## How did we decide to work with scope?

Well with Pipeline GA4, we made one table per scope. 

So we have: 

- **User table** - User ID scope.
- **Client table** - Pseudo user/client scope.
- **Session table** - Session scope.
- **Pageview Table** - Hit scope.
- **Items Table** - Item scope.

The basics of selecting the right table are pretty simple.

## How do you pick a table?

Pick the question you're interested in:

Do you want to know:

- Where do your users come from? > User Scope
- Where do your sessions come from > Session Scope
- Which page gets the most visits from a country > Hit scope

It then gets a bit more fiddly when you want to combine both in one table. I.e. suppose you want sessions & users in one table.

That's usually where the problems start. 

90% of the mistake we see are putting user metrics with session dimensions.

Here's our cheat sheet for what metrics you can pair with dimensions:

![ga4_scoping_compatibility_overview](/images/resources/documentation/ga4_scoping/ga4_scoping_compatibility_overview.png)

Let's explore that a bit more.

## What is scope?

Scope in GA4 is the level of grouping we apply to our analytics e.g.

- Session
- User
- Hit

It’s easier to understand by thinking about the goal we're trying to achieve. 

When you open up your analytics you might look for different things:

- I want to know how my users are behaving
- I want to know how each individual session is going
- I care about each individual page-view

The choice you’re making is scope.

## GA4 has 4 main levels of scoping

These each represent different levels of aggregation.

![ga4_scoping_hierarchy](/images/resources/documentation/ga4_scoping/ga4_scoping_hierarchy.png)

- **User** (`user_id`): Data tied to a specific logged in user.
- **Client** (`user_pseudo_id`): Data tied to a specific browser-device pair.
- **Session**: Data tied to a session or journey.
- **Event**: Data tied to a specific action (e.g., clicking a button).

Also if you work in eCommerce, there’s also Item Level Scoping.

- **Item**: Data tied to specific items, like products in an e-commerce transaction.

To report on the right numbers, you need to pick the right scope.

### The fundamental rule when choosing scopes

> If you are using a dimension at a particular scope, you must *always* use a metric that is either at that scoping level or at a lower level than that scope in the hierarchy.
> 

Let’s break that down…

If you are using a **User Scoped Dimension** (`user_id`), you can use metrics from any scope as User Level scoping is the highest scoping level.

![user_level_dimension_compatibility](/images/resources/ga4_scoping/user_dimension_scoping_compatibility.png)

If you are working with a Client Scoped Dimension (`user_pseudo_id`), you must work with metrics that are at the same scoping level or less.

This means you should not use User scoped metrics (such as Users). You can, though, use Client Level, Session Level & Event Level scoped metrics.

![user_level_dimension_compatibility](/images/resources/ga4_scoping/client_dimension_scoping_compatibility.png)

If you are working with a **Session Scoped Dimension** (let’s say session medium), you can only work with Session Scoped or Event Scoped metrics.

This means that you can use Session or Event level metrics, but you couldn’t work with Client (`user_pseudo_id`) or User (`user_id`) Scoped metrics.

![session_level_dimension_compatibility](/images/resources/ga4_scoping/session_dimension_scoping_compatibility.png)

If you are working with an Event Scoped Dimension, you can only work with event based metrics. You cannot work with any other type of metric except for event scoped dimensions. Why does using the wrong scope break your numbers?

![event_level_dimension_compatibility](/images/resources/ga4_scoping/event_dimension_scoping_compatibility.png)

In the example below, we can see how a **single user** interacts with the site over the course of a day.

- **User 1**
 - Session 1 - organic visit
 - Session 2 - paid visit

![user_example_1](/images/resources/documentation/ga4_scoping/user_example_1.png)

In this example, we can see there has been a single user, with 2 sessions.

With no dimensions, we have no issues.

![example_metric_count](/images/resources/documentation/ga4_scoping/example_metric_count.png)

That was nice and easy. Let’s make a problem.

We’ll add a session level dimension to our table with a user metric.

Now we’ve got double counting.

![example_metric_dimension_count](/images/resources/documentation/ga4_scoping/example_metric_dimension_count.png)

We double count our user because that user has 2 sessions, 1 via organic and 1 via cpc and we have a session level dimension.

There is technically 1 user in each of those buckets, but it’s the same one.

The error we’ve made is mixing a session level dimension with a client/user metric.

If we want to look with session medium alongside a user based metric (i.e Users) then we need to use a user based dimension as this is higher up the scoping hierarchy.

![session_first_user_example](/images/resources/documentation/ga4_scoping/session_first_user_example.png)

But then we miss out on exactly where each session came from. There is no right answer, just the different metrics for different moments.

### Doesn’t GA4 mix together incompatible sessions & metrics though?

In short, yes.

This is particularly easy when using explorations where you can just select whatever metrics and dimensions you want.

Take these numbers below. We’re doubling counting all the users who have had multiple sessions.

![session_dimension_with_a_user_metric](/images/resources/documentation/ga4_scoping/session_dimension_with_a_user_metric.png)

There are more guard rails on the GA4 interface such as separating the Session & First User Acquisition tables, but it’s still very possible.

If you used to use UA and you’re wondering about the old User Acquisition report? Yep it was wrong. We were double counting there and no-one noticed.

### What do you need to be aware of when using our tables?

**Are you just working with metrics?**

If you are only generating metrics with no breakdowns, you can safely pick any of the tables that contain the metrics you want without any problems.

**Do you want dimension breakdowns?**

Pick one level of scope that you want dimensions for and pick that level of table.

Do you want session level dimensions? Pick the session table.

Do you want user level dimensions? Pick the user table.

Then don’t try to generate metrics from scopes that sit above the others.

The table below for a summary.

![ga4_scoping_compatibility_overview](/images/resources/documentation/ga4_scoping/ga4_scoping_compatibility_overview.png)

<Callout type="info">
Important note: It is technically possible to mix non-compatible dimensions & metrics. For example, we provide `client_key` in the session table that you can use to generate a user count though this would cause double counting.
</Callout>

Let's talk about GA4 and scoping. Very important. Let's walk through it.

Which table should you use?

# How to set up your Pipeline Looker Studio GA4 Sessions Template

So you’ve got your data loaded into Pipeline - GA4. Good job. 

The next step is creating your first Looker Studio dashboard. Well, luckily for you, we have a GA4 sessions Looker Studio dashboard template which you can easy copy. 

This is built off a single data source, the GA4 Pipeline sessions table.

![dashboard_screenshot](/images/resources/setting-up-sessions-dashboard/dashboard-screenshot.png)

## What does the GA4 Pipeline Sessions template do?

The dashboard is a single page dashboard built of a single data source - the Pipeline GA4 Sessions table.

<Callout type="info">
<OutboundLink href="https://lookerstudio.google.com/u/0/reporting/77b6d89d-6eb6-45a6-bf9e-5b9b18ef20de/page/p_7g0r7vtzid/edit" children="The template can be found here." />
</Callout>

This contains: 

- Scorecards summarising performance compared to the previous period & year.
- How the sessions are broken down by different session mediums.
- Session breakdowns over different segments such as:
 - Default Channel Grouping
 - Country
 - Day of the week
- An interactive comparison of how different metrics compare over time.
- An interactive comparison of how different metrics look broken down over different dimensions.
- The top performing pages by sessions.

## How to make the GA4 Pipeline Sessions dashboard.

This can be broken down into 4 steps: 

1. Get your GA4 data running through Pipeline GA4. 
2. Create your Pipeline GA4 Sessions table as a data source.
3. Make a copy of the Looker Studio template. 
4. Connect your Sessions table as the data source. 

### 1. Get your GA4 data running through Pipeline GA4.

We assume you’ve already done this - or why would you be looking at this blog?

But if not, you can visit the <OutboundLink href="https://www.pipedout.com/documentation/" children="documentation" /> for Pipeline and follow the steps there. 

This will mean your **processed** GA4 data will be loaded into different tables in BigQuery. 

The different out tables produced are: 

| **Table Name** | **Type** | **Default or Optional** | **What does it do?** |
| --- | --- | --- | --- |
| `ga4_mrt_clients` | output | Default | Client level data (i.e. device + browser) |
| `ga4_mrt_items` | output | Default | Ecommerce item level data |
| `ga4_mrt_pages_daily` | output | Default | Page level data (daily) |
| `ga4_mrt_pages_hourly` | output | Default | Page level data (hourly) |
| `ga4_mrt_sessions` | output | Default | Session level data. |
| `ga4_mrt_attribution` | output | Optional | Attribution models for conversions |
| `ga4_mrt_users` | output | Optional | User level data. (optional) |

For this we’ll be using the `ga4_mrt_sessions` table as we’ll just be looking at sessions.

### 2. You will need to create your Pipeline GA4 Sessions table first

The first step is going to <OutboundLink href="https://lookerstudio.google.com/datasources/create?connectorId=AKfycbwmJ61PQUIim709LqiS27ekmiQuWEj2Tk9eQu4uxZY0KRE3PdH8fSmFjBg7XAc913rwkw" children="this link." />

This link allows you to create a dataset, in this case the GA4 sessions table. 

Let’s break this down: 

1. **Select your BigQuery project.** 

This will will then show you the dataset for any pipelines you have setup.


<Callout type="info">

It will only show datasets where the full GA4 set-up has been completed.
If you’re missing a dataset and you’ve completed the run please contact support.

</Callout>

![setup_looker_studio_1](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_1.png)

2. **Select the Table**

For this, you will want to select Sessions.

![setup_looker_studio_2](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_2.png)

3. **You will then need to select Primary Schema.**

There is only one option - and you need to select it (or the Connector will not work).

This is a bit of a quirk of how Looker Studio Connectors & BigQuery interact.

![setup_looker_studio_3](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_3.png)

4. **Select the Time Period.**

You can select single period. 

![setup_looker_studio_4](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_4.png)

5. **Select the Metrics & Dimension Parameters**

For this you can select the the most common metrics and dimensions you use. 

These will be the defaults when viewing data in your dashboard.

<Callout type="info">
**Important**: 

Ensure that you tick all the checkboxes or part of the Looker Studio Dashboard will break.
</Callout>


![setup_looker_studio_5](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_5.png)

6. From then you will be able to Connect or “Reconnect” in the top right hand corner.

![setup_looker_studio_6](/images/resources/setting-up-sessions-dashboard/hit_connect_in_corner.png)

So at this point, you now have your GA4 sessions table.

### 3. Make a copy of the Looker Studio template.

This is broken up into several stages:

1. The first step is visiting the <OutboundLink href="https://lookerstudio.google.com/u/0/reporting/77b6d89d-6eb6-45a6-bf9e-5b9b18ef20de/page/p_7g0r7vtzid/edit" children="single page sessions dashboard template." />
2. You’ll then want to go to the three dots in the top right > Make a copy

![new_make_a_copy](/images/resources/setting-up-sessions-dashboard/new_make_a_copy.png)

### 4. Connect your Sessions table as the data source.

The next option which automatically appears is which data source you want to connect to. 

For this dashboard that will be your sessions table.

![setup_looker_studio_7](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_7.png)


<Callout type="info">
**Note:** If you cannot see the data source that you've just created, you may need to refresh the page and try again.
</Callout>

### 5. When you’re done hit copy report.

![setup_looker_studio_9](/images/resources/setting-up-sessions-dashboard/setup_looker_studio_9.png)

Then hey, you have your template.

<Accordion header="Why is the Country Chart not working?">
When making a copy of a Looker Studio template, even through the data sources contain the same fields, they can occasionally still throw errors. 

A fairly consistent one is the Country, Geographical map. 

![broken_country_map](/images/resources/setting-up-sessions-dashboard/broken_country_map.png)

Though, there is a pretty simple fix.

![broken_country_map_2](/images/resources/setting-up-sessions-dashboard/broken_country_map_2.png)

</Accordion>

Create a version of the GA4 Pipeline Looker Studio dashboard template. Built from the  GA4 sessions table, it’s easy to make a copy and customise to your needs.

How to set up your Pipeline Looker Studio GA4 Sessions Template

How to set up your Pipeline Looker Studio GA Sessions Template

# What is the Pipeline - GA4?

We take your raw GA4 BigQuery data and transform it into simple to use tables, giving you the power of the raw data without the complexity.

These tables can then be accessed through our Looker Studio connector or directly pulled from BigQuery into any other BI platform.

<TableOfContents toc={toc} ignoreH1={true} />

If you're not familiar with this at all, the quick pitch is:

1. **Google let's you export raw GA4 data** - Google allows you to export all of your raw event data from your GA4 properties into BigQuery (Google’s cloud data warehouse solution).
2. **It's super valuable** - Having access to this raw GA4 data is super valuable. It fixes most of the common weaknesses of GA4 and allows you to do a lot of really valuable pieces of analysis.
3. **But it's complex to use** - This data is stored in a fairly complex manner. You usually need a data engineer or a BI team to make it useful.
4. **Pipeline does it all for you** - Pipeline takes that raw data and transforms it into simple to use tables, doing all the complex transformations for you and letting you configure it all through a simple UI.
5. **Easy access to the data** - You can then access the tables directly in BigQuery or with our own Looker Studio connector.

![pipeline_introduction](/images/resources/documentation/introduction/pipeline_introduction_2.png)


## Why use BigQuery and why use Pipeline?

These are the two most common questions we get:
- Why use GA4 BigQuery over the GA4 interface?
- And then why use Pipeline rather than building something yourself?

### Why use BigQuery?

**No more sampling and API Limits**

No more sampling, cardinality or data limits you find in the GA4 interface. 

**You can keep your data forever** 

The 14 month restriction on raw data is one the biggest businesses face. 
With BigQuery you can have your raw GA4 data stored forever, for all time. 

**Raw data is incredibly powerful**

Want to create an interesting new dimension? Maybe, last page viewed before conversion or number of blog posts read over the past year?
With raw data, there are essentially no limits you can build pretty much whatever you want. No more struggling through the interface or the Looker Studio connector to get something done.

### Why use Pipeline?

**No data engineer required**

You don’t need to know any SQL, or have a data engineer to make the most of it.

Want a complex segment? Or a new user property? Fill in a form and you’ve got it.

**A better Looker Studio connector**

The default Looker Studio connector is ok. But it has all the restrictions of the GA4 UI plus some of it’s own.

Because our Looker Studio connector builds off your raw BigQuery GA4, we can help you get around all of the limits of the default connector.

You’ll be able to:

- Filter out spam
- Pull out custom dimensions and metrics
 - Plot the number of users who converted after visiting a blog.
 - Find the last content page viewed before conversion?
 - etc.
- Get access to separate columns for period comparison data.
- Accurately count users for segments
- …and all of the other standard benefits from using our tables.

**Friendly UI**

Configuring GA4 BigQuery Export can be a bit fiddly. There are a lot of options and a lot of decisions. 

We know that and we’ve built our interface with it in mind, we’ll walk you through all the hard decisions to make configuring easy.

**Reduce BigQuery costs**

Using GA4 with BigQuery, is practically free for smaller sites, but if you have a large site it’s quite easy to start racking up the costs. Particularly when working with more complex segments like user dimensions.

Our team has a huge amount of experience at running larger models and we found people who went from some copy pasted home builds to using Pipeline having their data costs reduced by 100x. 

E.g. we saw someone go from raw tables which were ~1GB a day, to our sessions table which was 40mb a day. (That’s 0.004%).

**Let someone else handle your data problems**

We’ve got a huge amount of experience dealing with GA4 BigQuery. If you do run into issues you’ll have access to our experienced team ready to jump in and support.

## Let’s get started 
We can think of this in two parts: 
1. **[Setting up your GA4 Pipeline](/documentation/setup-instructions/creating-your-pipeline/connecting-ga4-to-bigquery/)**
2. **[Accessing your GA4 data](/documentation/accessing-your-data/the-fundamentals/accessing-your-data/)**

Use the navigation to navigate through these steps.

An introduction to Pipeline GA4 - your GA4 Pipeline.

Pipeline GA4 - Introduction

Introduction to Pipeline - GA4

# Setup: Connecting GA4 to BigQuery

Are you already exporting GA4 to BigQuery?

If you are already exporting your GA4 data to BigQuery then go to **Step 3.**

Pipeline will use the permissions that your email has so those need to be correct! 

_(If you need a service account please get in touch.)_

If not, then start at step 1.

<TableOfContents toc={toc} ignoreH1={true} />

## This can be broken up into a few stages:

1. Setting up Google Cloud
2. Setup the GA4 -> BigQuery connection
3. Getting the correct permissions from Google Cloud

Once this is set up, you can then move onto setting up Pipeline.

## Step 1: Setting up Google Cloud

In order to allow GA4 data to be pumped in BigQuery, you need to have a Google Cloud account set up.

In the process of setting up this cloud account, you'll set-up a project. This is where everything will live!

Google does a pretty good of walking you through these basic parts.

<OutboundLink href="https://cloud.google.com/docs/get-started" children="You can get started on this page." />

![get_started_with_google_cloud](/images/resources/documentation/connecting_ga4_to_bigquery/get_started_with_google_cloud.png)

Once you are set up, you will need to have a <OutboundLink href="https://cloud.google.com/billing/docs/how-to/create-billing-account" children="self-serve Cloud Billing account." />


This allows you to automatically pay for your Google Cloud usage costs.

**How much does it this cost you?**

You might’ve heard terrible things about cloud bills, but don’t worry BigQuery won’t cause you any issues.

For context, for a site the size of Piped Out, which gets a couple thousand visitors a month costs us approximately $0.03 a month to run. And the first $6.25 every month are free!

**Are you running a large property?**

If you’re notably larger, then please take a look at the <OutboundLink href="https://cloud.google.com/bigquery/pricing" children="BigQuery costs" /> page. The most important one is the cost to query a TB which is currently $6.25 per TiB.

It’s also worth noting we’ve helped a lot of customers massively reduce their data costs (sometimes 100x, depends a lot on your current maturity) with Pipeline.

If you are finding yourself spending a lot of money on BigQuery please reach out. There is a good chance we can help!

**Back to setting up your billing account**

Billing accounts can be set-up and updated on <OutboundLink href="https://console.cloud.google.com/billing" children="this page in Google Cloud" />


<Callout type="info">
Important: Without billing details on your Google Cloud account, you’ll use the BigQuery Sandbox—a free, limited environment where data automatically expires every 60 days. Expiring data is bad as hopefully goes without saying.
</Callout>


## Step 2: Setting up Google Analytics 4 to BigQuery Connection.

For 80% of people we can walk you through how to set this up relatively simply.

If you do run into issues then Google has a <OutboundLink href="https://support.google.com/analytics/answer/9823238" children="page covering all the possible scenarios" />


**Let’s get started.**

Go to <OutboundLink href="https://support.google.com/analytics/answer/6132368?sjid=689850879498832827-EU" children="Admin on GA4" />

<Callout type="info">

You need to have Editor or above permission at the GA4 property level to set up this connection.

</Callout>

You will then need to clicks BigQuery links.

![click_bigquery_links](/images/resources/documentation/connecting_ga4_to_bigquery/click_bigquery_links.png)

Then select Link.

![bigquery_links_link](/images/resources/documentation/connecting_ga4_to_bigquery/bigquery_links_link.png)

You will then need to select a BigQuery Project you manage and pick a location.

**Which project do I pick?**

You can either select an existing project or create one yourself. See previous section on setting up Google Cloud. 

**What do I put for Location?**

If you already have a BigQuery account and existing datasets, then put it in the same region as them. 

If you don’t, then just pick the region that your business is based in.

![bigquery_setup_choose_bigquery_project](/images/resources/documentation/connecting_ga4_to_bigquery/bigquery_setup_choose_bigquery_project.png)

![data_steams_selection](/images/resources/documentation/connecting_ga4_to_bigquery/data_steams_selection.png)

**What should I do for data steams and events?**

You can configure the data streams and events to exclude specific events if you want.

We recommend just picking the main stream and not excluding anything. There are some edge cases where you might want to do something different. We cover that in this blog post.

**What export type should I pick?**

The main decision is between the **Daily Export** & **Steaming Export** 

What’s the difference?

- **Daily Export** - This is exports the GA4 data once a day. Limit of 1 million events daily.
- **Steaming Export** - This exports the GA4 continuously. There is no limit on this.

GA4 will basically tell you how close you are to the daily limit. 

If you’re getting to 800 - 900k daily events and still growing:
- Turn on streaming
- And turn on daily.

If you’re lower than that:
- Just turn on daily.

<Callout type="warning">
There is currently a nasty bug with the streaming export where sometimes it will not export all of your events. 

There is no brilliant solution for this at the moment. Please check out our blog post for more information.
</Callout>



## Step 3: Sharing permissions in Google Cloud

For Pipeline to work, we need to have the right level of permission for the Pipeline to run.

You can check that on our <OutboundLink href="https://line.pipedout.com/permissions/google-api/register" children="permissions page" />.

### How to check this

1. Open the link above. 
2. Share permissions with Pipeline.
3. Select the project you're going to use in the drop down and hit [Check Permissions]
4. The table will then tell you if you have the correct permissions.

![check bigquery permissions](/images/resources/documentation/connecting_ga4_to_bigquery/checking_permissions.png)

If you don't you're going to need to go and get them. The easiest way to do this is to add roles to the user using Pipeline. 

The roles are shown in the table, but we'll also go through them here. We're going to talk through two different permissions set-up. The tradeoff is ease of set-up vs how locked down your permissions are.

### Simple: The easiest permissions set-up

Give the email you use with Pipeline the following roles at project level:

- BigQuery Job User
- ServiceUsageConsumer
 - _Why do we need this?_
 - In order to list datasets available the API will try to access the project first to list the resources and that requires this permission. (It baffled us too at first!)
- BigQuery Data Owner

### Complex: Lock down your datasets

We can choose here to lock down our datasets. In this case rather than the user having access to all of your datasets we're going to only provide permissions to specific datasets.

In this case you'll have to manually do some of the BigQuery set-up that Pipeline does for you. Specifically you'll need to create datasets.

You'll need to create:

1. A dataset where we'll put the Pipeline.
2. A dataset where we'll store the Pipeline raw data. 

They need to have names in the following format:

- `{dataset_name}`
- `{dataset_name}_pl_raw`

And be in the same region. 

Then for each of those datasets you'll need to give your user the role:
- BigQuery Data Editor

Or they won't show in the interface.

The first step with GA4 Pipeline is connecting your GA4 to BigQuery. Let's walk through it.

Setup: Connecting GA4 to BigQuery

1. Connecting GA4 to BigQuery

# How to create a pipeline

Time to get started. We need to make a pipeline.

<TableOfContents toc={toc} ignoreH1={true} />

## What is a pipeline?

A pipeline is a process that runs every day and prepares our data. 

For GA4 it takes the raw GA4 event data and turns it into a set of pre-constructed tables which are simple to use.

![diagram_pipeline_process](/images/resources/documentation/creating_a_pipeline/diagram_pipeline_process.png)

## Step by Step

### 1. Open the app

Start off by opening the app: <OutboundLink href="https://line.pipedout.com/" children="https://line.pipedout.com/" />

### 2. Choose a Plan

If you've opened the app for the first time, you'll be presented with a checklist of things to do.

The first of these is to Choose a Plan.

![choose_a_plan](/images/resources/documentation/creating_a_pipeline/choose_a_plan.png)

Here, you'll (unsuprisingly) need to select the best plan for you.

### 3. Share Permissions

Now we need to share the necessary permissions. 

If this is one of your first times visiting the checklist will appear.

![share_persmission_checklist](/images/resources/documentation/creating_a_pipeline/share_persmission_checklist.png)

If not, you'll need to follow these steps. 

![share_permissions](/images/resources/documentation/creating_a_pipeline/share_permissions.png)

You'll be able to follow the steps here to make this work.

### 4. Create a new pipeline

You'll be able to see this in the checklist.

![create_pipeline_checklist](/images/resources/documentation/creating_a_pipeline/create_pipeline_checklist.png)

Or you can use the interface to head to the create pipeline page.

![create_pipeline_button](/images/resources/documentation/creating_a_pipeline/create_pipeline_button.png)

### 5. Choosing your pipelines name

Typically you’ll want a single pipeline per site.

A good name is often the name of the website you're reporting on.

This pipeline could include multiple GA4 properties if they're all for the same site and should be reported on together. (e.g. if you have one for the blog, marketing subdomain & main site).

If you have group of sites you might want to report on together, then that's also a good name.

**Example Names**:

_The names have to be underscores and letters/numbers._

- `my_ecommerce_store_com`
- `emea_microsites`

![enter_name](/images/resources/documentation/creating_a_pipeline/enter_name.png)

### 6. Set the Google Project you’ll build the pipeline in

 Select the Google Cloud Project ID where you want to save your data.

<Callout type="info">

 **Note**: This doesn't have to be the project where GA4 is exporting to. This is where you want it to end up!

If you can’t see any projects during this stage, you will need to recheck permissions.

</Callout>

![google_cloud_project](/images/resources/documentation/creating_a_pipeline/google_cloud_project.png)

 ### 7. Choose the dataset you’ll build the pipeline in

Select a BigQuery Dataset where you want to store your processed data.

You can create a new dataset using the link in the description.

<Callout type="info">

The BigQuery dataset we save to will need to be in the same region as your GA4 export. 

Example: If your raw GA4 BigQuery export is saved in `eu` then your output dataset must be in the `eu`. If you don’t have one you’ll need to create it.

If you don’t know where it’s exporting you can check that <OutboundLink href="https://line.pipedout.com//ga4-bigquery-links" children="here with our BigQuery export links tool." /> This will show you the GA4 export location in BigQuery.

</Callout>

![bigquery_data_set](/images/resources/documentation/creating_a_pipeline/bigquery_data_set.png)

The dataset will automatically populate based on the dataset.

![google_cloud_region](/images/resources/documentation/creating_a_pipeline/google_cloud_region.png)

### 8. Choose when your pipeline runs

You’ll need to select your timezone and the hour of the day you want the pipeline to run at. 

By default something like 6:00AM in your timezone is sensible.

![timezone_run_hour](/images/resources/documentation/creating_a_pipeline/timezone_run_hour.png)

Then you’re good to create!

![create](/images/resources/documentation/creating_a_pipeline/create.png)

The next step is creating a Pipeline. This page talks you through that process.

Setup: Creating a Pipeline

2. Creating a Pipeline

# Advanced Configuration - GA4 BigQuery Pipeline.

There are several advanced settings for your GA4 BigQuery pipeline.

## Where is this in the Pipeline Interface?

![where_to_find_advanced_settings_bigquery](/images/resources/documentation/advanced_config_bigquery/where_to_find_advanced_settings_bigquery.png)

## What are you setting up here?

These settings allow you to fine-tune how your GA4 data is processed and stored.

1. Converting GA Session ID to a string
2. Adjusting the rolling recalculation window
3. Adding first open events to a session

This is what it looks like: 

![what_the_options_look_like](/images/resources/documentation/advanced_config_bigquery/what_the_options_look_like.png)

Let's break them down.

### 1. Converting GA Session ID to a string

GA Session ID is a number by default. Depending on how you ingest the data, though this may add it as a sting. 

This setting allows you co coerce all GA Session IDs to a string.

### 2. Adjusting the rolling recalculation window

It can take up to 72 hours for raw GA4 data to fully finalize.

Pipeline automatically recalculates GA4 data for the most recent 3 days to ensure the data is complete and accurate.

If you’re managing a very large site, you can shorten this recalculation window to help reduce your GA4 BigQuery costs.

### 3. Adding first open events to a session

Occasionally the GA4 raw data can include a `first_open` event in isolation without a session. 

What we can do is stitch them back into a session when this is possible to do. 






There are several advanced configurations specifically for how your GA4 BigQuery pipeline is processed.

Advanced Configuration - BigQuery

# Advanced Configuration

There are several advanced configuration options.

## Where is this in the Pipeline Interface?

![where_are_advanced_config_options](/images/resources/documentation/advanced_config/where_are_advanced_config_options.png)

## What are you setting up here?

Here we’re setting up advanced GA4 data settings. 
This consists of:

1. Setting the GA4 attribution window.
2. Enabling ``` user_id ``` (if you have User ID set up!)
3. Forcing medium UTM to be lowercase.
4. Splitting Session on New Source.
5. Filtering Events from Source Splitting.

This is what it looks like on the interface.

![advanced_config_overview](/images/resources/documentation/advanced_config/advanced_config_overview.png)

## What do these options do?

Let’s take them one by one. 

### 1. Setting the attribution window

What are we changing with the attribution window?

By default Google (and Pipeline) use an attribution method called **Last Non Direct Attribution**. 

This means that when a user visits the site, the traffic source is set to the last traffic source that isn’t (direct).

**Example: Let’s take the example of a user who has 3 sessions.** 

In this basic example every session has a medium and source so **Last Non Direct Attribution** does nothing. 

The session source & medium don’t change.

![ldnc_1](/images/resources/documentation/advanced_config/ldnc_1.png)

But what if the last session was direct?

![lndc_2_example](/images/resources/documentation/advanced_config/lndc_2_example.png)

Using **Last Non Direct attribution**, the Traffic Source are taken from the previous session.

We look for:

- The last non direct traffic source

And we do that by going backwards through all previous sessions:

- In this case the previous session was from Google so we attribute to that.

![ldnc_3_example](/images/resources/documentation/advanced_config/ldnc_3_example.png)

The time period that Google (and Pipeline) use for the **Lookback Period** for the attribution window by default is set to 90 days. 

This setting allows you to adjust the days in to lookback for your specific business. 

**Example**:

If your client base has a long decision-making cycle — say, customers typically take several weeks or months to convert after first engaging with your site you could extend this back beyond the 90 days.

On the other hand, if your customers convert quickly, a shorter lookback window (like 30 days) could give you more of an insight on how they convert in the short term.


### 2. Enabling ``` user_id ``` (if you have User ID set up!)

What are we changing with the User ID setting?

In GA4, a “User” can mean different things. 

On a brand new GA4 account (without Google Signals) a user is:

- A device + browser

But if you have done some custom setup, you can create a unique ID for logged in users called a User ID.

This allows you to track a user across different devices and browsers. If you have done this custom setup to generate User IDs then you can enable this and we’ll support your user ID.

We will:
- Add the user ID into existing tables.
- Associate any client ID's to a user ID which are used in the same session.
- A new scope table specifically for user ID will be created to allow you to explore those users.

![user_id_explainer](/images/resources/documentation/advanced_config/user_id_explainer.png)


### 3. Forcing medium UTM to be lowercase

In general UTM tags are lowercase. 

If you've set these up and they may be uppercase or mixed they could be misread.

In that case, it may be well worth ticking this box to make these uniform.


### 4. Splitting Session on New Source

The way that sessions are attributed changed from UA to GA4.

This means that some channels aren't recieving the credit for conversions even if they were the last channel before converting. 

In simple terms. 

**GA4 Sessions**:

When someone visits a site, the actions they take during that visit are grouped into a “session”.

A new session starts after 30 minutes of inactivity (though the <OutboundLink href="https://support.google.com/analytics/answer/9191807#zippy=%2Cadjust-session-timeout" children="session timeout period" /> can be adjusted in GA4).

**UA Sessions**:

In the same way as GA4, Universal Analytics sessions start new after 30 minutes of inactivity. 

But there are two other key ways that new sessions are triggered.

1. After midnight an interaction would be considered a new session.
2. If a user arrives via one campaign (i.e. via Organic) and then then comes back via a different route (i.e. via Paid).

The first one is kind of annoying and it’s probably good it went. *(Although if you had a lot of over midnight traffic for your site, this was probably irritating).*

It’s this second one that’s super important.

**How are our metrics different between these two?**

If we compare them, there are two key differences:

1. The number of sessions will be different. 
2. The traffic source for conversions can be different. 

**Let’s run through that previous example again.**

Using GA4 sessions, this would be a single session.

![ga4_ua_sessions_1](/images/resources/documentation/advanced_config/ga4_ua_sessions_1.png)

This is how it would look in a table:

![ga4_ua_sessions_2](/images/resources/documentation/advanced_config/ga4_ua_sessions_2.png)

But for UA sessions, this translate to 3 sessions with the newsletter taking all the credit for the conversion

![ga4_ua_sessions_3](/images/resources/documentation/advanced_config/ga4_ua_sessions_3.png)

This is the table view.

![ga4_ua_sessions_4](/images/resources/documentation/advanced_config/ga4_ua_sessions_4.png)

So this gives us some good points and some bad:

- The conversion is attributed differently.
- All channels get a session which gives them all some credit.
- We overcount sessions.

**Should I tick this box?**

This will create an additional table `ga4_mrt_sessions_mtouch`

This will show your "split" sessions.

### 5. Filtering Events from Source Splitting.

This is only if you are splitting sessions.

The main difference with split sessions is that each and every time you come back to the site from a different source, this would count as a session. 

This would also mean that if you were to be redirected to stripe, mastercard, paypal etc to pay, then you come back this would be considered a new session. 

To overide this, you can put in the sources that you're not interested in counting as "new" sessions.

See the below for an example. 

![filter_events_source_splitting](/images/resources/documentation/advanced_config/filter_events_source_splitting.png)

There are . This page talks you through that process.

Advanced Configuration

# Channel Groupings

We provide a custom channel grouping to go alongside the default one. Rather than attempt to map closely to the GA4 numbers we've added some additional channel groupings.

Please customise to whatever works for your site!

<TableOfContents toc={toc} />

## Where is this in the Pipeline Interface?

![channel_groupings_nav](/images/resources/documentation/channel_groupings/channel_groupings_nav.png)

## What are you setting up here?

This page allows you to customise an extra channel grouping using a case statement. 

## How does this work?

Channel groupings are created by taking by looking at traffic source fields and applying a set of rules. 

This categorisation is based on the:

- Source
- Medium
- Campaign
- Source Category

Our default channel grouping attempts to mimic the default channel grouping in the GA4 interface. (So your numbers match the GA4 interface.)

Our extra channel grouping is an improved version of this. The default channel grouping makes some strange decisions/has some issues and so we have an improved version.

Once you're comfortable that your data is correct, we strongly recommend altering our extra traffic grouping for your website and using that.

## How can I alter the extra channel grouping?

On the channel grouping config you'll be able to see our current rules.

![channel_groupings_screenshot_1](/images/resources/documentation/channel_groupings/channel_groupings_screenshot_1.png)

We use a case statement to categorise traffic.

You might recognise these from Looker Studio or perhaps you already know what one is. 

If you've never seen one before we have a starter guide on <OutboundLink href="/documentation/setup-instructions/useful-concepts/case-statement-editor/" children="case statements here" />.

Let’s go through some examples of how to adjust it.

### Example 1: Adding tiktok to organic social

Let’s adjust the Organic Social categorisation to match more specific sources. 

The current conditions are as follows:

<CodeMirrorDisplay>

```sql
-- Traffic is "Organic Social" when:
WHEN 

		-- If source matches one of the following values
 regexp_contains(
 dim_source, 
 r"^(facebook|instagram|pinterest|reddit|twitter|linkedin)"
 ) = true
 
 -- Or if medium matches one of the following values
 OR regexp_contains(
 dim_medium, 
 r"^(social|social-network|social-media|sm|social network|social media)"
 ) = true
 
 -- Or if the source category is SOURCE_CATEGORY SOCIAL
 OR dim_source_category = 'SOURCE_CATEGORY_SOCIAL'
THEN 'Organic Social'
```
</CodeMirrorDisplay>

How can we add tiktok to this? We can change the first line in this case. 

It’s using `regexp_contains` to match source against a list of values.

You don’t need to know exactly what this does, you just need to extend the pattern.

We can adjust as follows:

<CodeMirrorDisplay>

```sql
-- Traffic is "Organic Social" when:
WHEN 

		-- If source matches one of the following values
 regexp_contains(
 dim_source, 
 -- CHANGE ON THE LINE BELOW
 -- Here we've added tiktok onto the end.
 r"^(facebook|instagram|pinterest|reddit|twitter|linkedin|tiktok)"
 ) = true
 
 -- Or if medium matches one of the following values
 OR regexp_contains(
 dim_medium, 
 r"^(social|social-network|social-media|sm|social network|social media)"
 ) = true
 
 -- Or if the source category is SOURCE_CATEGORY SOCIAL
 OR dim_source_category = 'SOURCE_CATEGORY_SOCIAL'
THEN 'Organic Social'
```
</CodeMirrorDisplay>

Remember you can always reach out to support if you're struggling to update this.

## What are all the fields available for categorisation?

These are all the fields. The first 3 are relatively self explanatory.

- dim__source
- dim__medium
- dim__campaign

Then we move onto these 4. What are they and why are they useful:

- dim__source_category
- dim__campaign_source
- dim__platform_click_id_type
- dim__cross_channel_default_channel_group

### dim__source_category

Google has a big list of websites. They've bucketed these into groups and that's what source category is.

So something like this:

- when dim_source_category = 'SOURCE_CATEGORY_SOCIAL' then "social"

Says when you get traffic from a website, check what bucket Google has put that website into. If it's SOURCE_CATEGORY_SOCIAL then return the channel grouping "social".

In the future you'll be able to customise this as well, but that isn't possible at the moment.

### dim__campaign_source && dim__cross_channel_default_channel_group

We've listed these together because they're both trying to do the same thing. 

If you don't fancy the explanation the TL;DR is:
- Use dim__cross_channel_default_channel_group to see what Google thinks the channel grouping for ad edge cases.
- Probably don't use dim__campaign_source beyond how it is already used, it is harder to use.
- Hopefully you shouldn't have to use either because we've done it for you to fix the paid side.

The history is as follows:
- Google sometimes generates it's channel grouping based on other sources (e.g. adwords)
- They didn't originally export this to BigQuery and eventually started exporting some of this data.
- For example this is how the GA4 interface shows you PPC campaign names, but BigQuery couldn't historically do this.
- 






## Why would I want to use the extra channel grouping?

GA4 (and Pipeline) use rules in order to categorise traffic into different default channel groupings.

These groupings, though, can be misleading or sometimes, just wrong. 

**Example:** in GA4 traffic coming from gmail is classified as “Organic Search.” 🤷‍♂️ 

Broadly speaking, we think we have some more sensible defaults.

## Should you personalise ours/the default channel grouping?

In general, we would say yes. 

You know your traffic much better the Engineers at Google who have needed to make channel grouping which are generic for everyone. 

We've tried to improve the default channel grouping, but it's still unlikely to perfectly match your unique traffic.

**With the GA4 categorisation, many cases:**

- The domain classification is wrong.
- Extra channel grouping could improve how your traffic is categorised (i.e a Default Channel for AI).
- You can categorise more specifically your referral traffic.

## Do you have a recommended traffic grouping?

Yes above, though do customise it. 

## How does Google's default channel grouping work?

Pretty much exactly as we've described above and you can see in the case statement.

It runs through a series of rules to categorise traffic.
- The first one which matches will be used.
- The rules are largely based on the medium & source.

For example in order for traffic to be grouped as **Organic Social** it needs to be:

- Medium is one of (“social”, “social-network”, “social-media”, “sm”, “social network”, “social media”) OR
- The source matches a defined list of sites such as [facebook.com](http://facebook.com) etc.

The full list of rules is based on the rules found in this <OutboundLink href="https://support.google.com/analytics/answer/9756891?hl=en" children="Google Documentation" />

## Channel Grouping FAQs

<Accordion header="What are Channel Groups on GA4?">
There are several Channel Groups on GA4 including:

 - Organic Search
 - Audio
 - Organic Social
 - Paid Search
 - Paid Video

These categories are rule-based.

This means that the traffic medium or source would need to match specific rules.

![channel_groupings_nav](/images/resources/documentation/channel_groupings/channel_groupings_sources_example.png)

</Accordion>

<Accordion header="How do I know what to update?">
The easiest way to see what you should update is by looking at your data and looking at the default channel groupings. You can then break down by medium & source to see which ones are being attributed to the wrong channels.
</Accordion>

<Accordion header="What if my own rules are classifying things incorrectly?">
This will likely due to:
1. The order of the case statement. 
Case statements read from top to bottom, so make sure the most important channels are listed first. 
If you match something twice the first one to match will be selected.

![channel_groupings_nav](/images/resources/documentation/channel_groupings/case_statement_logic.png)
</Accordion>

The next step is adjusting your channel groupings if you choose. This page talks you through that process.

Setup: Adjusting Channel Groupings

Channel Groupings

# Conversion Events

The next step is creating your conversion events. 

<Callout type="info">
If you’re an e-commerce site and checked that box, all the default e-commerce conversions have already been setup for you!
</Callout>

<TableOfContents toc={toc} />

## Where is this in the Pipeline Interface?

![here_set_up_conversions](/images/resources/documentation/conversion_events/here_set_up_conversions.png)

## What are you setting up here?

This page allows you to create the conversion events for your site.

These let you measure valuable actions your users take.

For example this could be:

- Making a purchase
- Filling out a form
- Signing up to a newsletter.

_Currently we don't automatically pull through any key events you've set up in GA4 you'll need to re-add them here._

## How do I do it?

1. Start building a Conversion either manually or using one of the presets.
2. You customise the Conversion Event. 

Let’s break it down..

### 1. Add a conversion event

You can decide to either: 

- Manually set up a Conversion Event.
- Select (and adapt) one of the pre-existing Conversion Events.

![different_conversion_types](/images/resources/documentation/conversion_events/different_conversion_types.png)

### 2. Customise the conversion event

Then we want to customise it. We’ve annotated the form below, but you might also need to look at the examples below to help understand all the options!

![creating_conversion_events_diagram](/images/resources/documentation/conversion_events/creating_conversion_events_diagram.png)

### There are two key concepts to understand when creating a conversion:

1. You need to flag individual events as conversions.
2. You then need to chose how these events are rolled up across a session in case it happens more than once.

![creating_conversion_events_diagram](/images/resources/documentation/conversion_events/logic_creating_conversion_event_form.png)

## Examples

### Example 1: Purchase event for extended warranty purchases

Let’s say we’re creating a separate conversion for when a user purchases with extended warranty.

The two key conditions here are: 

- `event_name = purchase`
- `dim__add_ons = extended_warranty`

What is dim__add_ons?

This would be a custom event parameter that your developers have added to the purchase event. (We automatically add the dim__ to the start of it.)

![conversion_event_example_1](/images/resources/documentation/conversion_events/conversion_event_example_1.png)

### Example 2: Email signups

We want to track our email signups. But if someone accidentally hits form submit twice we only want to record it once.

The conditions here are: 

- `event_name = form_submit`

In this hypothetical example we have only one form submit on our website (which is unlikely), but ignore that for this example!

When this is true we do the following:

- We set the conversion value to 1.
- Then we set our aggregation to `MAX`. 

This means that in a session if the user submits the form twice, we take the max value of each conversion. 

And the max value of 1 is still 1!

![conversion_event_example_2](/images/resources/documentation/conversion_events/conversion_event_example_2.png)

## **Concept: What are Conversion Events in Pipeline?**

**Conversion Events are a enhanced version of GA4 Key Events**

In the GA4 interface, you’ll be familiar with “Key Events” (Previously known as Conversions).

Key events are basically “regular” events which have been tagged as “key.”

E.g. you could flag any of these as “key” events:

- `purchase`
- `form_submit`
- `in_app_purchase`

We make conversion events slightly differently and this gives two major advantages:

1. Can be retrospectively backdated (to when you started collecting BigQuery data)
2. The criteria can be a lot more detailed

## Conversion Events - FAQs

<Accordion header="Are conversion events applied retroactively?">
 Yes, as long as you have data and event setup. 

If, for example, you only set up a `purchase` event from October 2023, then it would only be applied back to that point.
</Accordion>

<Accordion header="If someone renamed an event can we still track it?">
 Yes, you can track the old or new event name. 

What you’ll need to do is set up a rule to match both of the events.
![conversion_event_example_3](/images/resources/documentation/conversion_events/conversion_event_example_3.png)
</Accordion>

<Accordion header="Can I set up events based on multiple event conditions? ">
Yes - you can add events based on conditions that apply across multiple events.
For example, you could set up on Conversion Event based on a user who submitted a form_submit event on a specific page (i.e. solution page).
![conversion_event_example_4](/images/resources/documentation/conversion_events/conversion_event_example_4.png)
</Accordion>

<Accordion header="Can I pull in specific pieces of information related to an event?">
You can do this in the <OutboundLink href="/documentation/setup-instruction/pipeline-ga4-setup/extracting-extra-data-for-ga4" children="Extract Extra Data section." />
What you are technically doing is unnesting specific event parameters.
For example, extracting the `link_url` from a `click event` which can be used here.
![conversion_event_example_5](/images/resources/documentation/conversion_events/conversion_event_example_5.png)
</Accordion>

<Accordion header="Do I need to pull all the ecommerce events?">
No - you can do this in the <OutboundLink href="/documentation/setup-instruction/pipeline-ga4-setup/ecommerce.mdx" children="ecommerce section." />
</Accordion>

The next steps is creating conversion events. Let's walk through it.

Setup: Creating Conversion Events

Conversion Events

# Ecommerce

If you’re an e-commerce site, then GA4 has a lot of tracking specifically for it.

- Purchase events
- Add to cart events
- Begin checkout events

We’ve got some nice and easy settings to help set-up GA4 for it.

## Where is this in the Pipeline Interface?

![are_you_ecom](/images/resources/documentation/ecommerce/are_you_ecom.png)

## What are you setting up on this page?

This page allows you configure e-commerce settings for your pipeline.

We do most of the e-commerce settings automatically so there isn’t a huge amount!

![ecom_checkbox](/images/resources/documentation/ecommerce/ecom_checkbox.png)


If you check the box to say you’re an commerce site then two things happen: 

- Pipeline will automatically pull e-commerce events into your GA4 output tables.
- It will generate an additional item scope level table to analyse your products.

## How do I do it?

### Ecommerce checkbox

If you are an e-commerce site then tick the box. Not much to this one.

### Item properties

You can then also choose any extra item properties you want incorporated into your items table.

This is very similar to what we did in <OutboundLink href="/documentation/setup-instructions/ga4-optional-setup/extracting-extra-data-for-ga4/" children="extracting extra data for GA4" />. But just for items.

<Callout type="info">

What we are technically doing here is unnesting the values in item_properties so that we can use them in recording e-commerce events.

</Callout>

Again similar to extracting extra data for GA4, we also provide a form below which gets every custom item properties with two example values for each event. 

These are taken from the last day of GA4 data.

## E-Commerce FAQs

<Accordion header="What if I have my own custom e-commerce events I want in a funnel?">
By default we only include the standard e-commerce events.

You'll need to any additional funnel steps you want in a conversion. Jump to the <OutboundLink href="/documentation/setup-instruction/pipeline-ga4-setup/conversion-events" children="Conversion Events" /> section to add these in. 
</Accordion>

<Accordion header="What if I don’t have e-commerce events set up yet, but I will do?">
Check the box and these will be automatically populated once they are set up.
</Accordion>

<Accordion header="What if I’ve migrated from custom e-commerce events to the standard ones?">
In order to get a joined up view, you’ll need to create them as <OutboundLink href="/documentation/setup-instructions/ga4-optional-setup/conversion-events/" children="custom conversions in this section." />

You can then make a conversion which matches either `x_conversion` event or `y_conversion` event. There's an example on the conversion events page.
</Accordion>

If you're an e-commerce site, we have some setting for you. This page talks you through that process.

Setup: Ecommerce

Ecommerce

# Extracting Extra Data

The next step is extracting any extra information we want from our GA4 events. 

This is optional. 

<TableOfContents toc={toc} />

## Where is this in the Pipeline Interface?

![where_to_extract_ga4_data](/images/resources/documentation/extracting_data/where_to_extract_data.png)

## What are you setting up here?

This page allows you to pull out extra pieces of information from events to use in the pipeline. 

GA4 sends lots of different types of events with different properties.

We do most of this for you automatically, but lots of sites have custom events with custom properties and we can help you handle it here.

<Callout type="info">
 What we are technically doing here is unnesting `event_params` so that we can use them easily in the future whether that be for Conversion events or User/ Session Properties etc.
</Callout>

**Which ones do we add automatically?**

The short answer is: pretty much every default property we use or process in some way. 

We usually find people only tend to add custom properties, because the rest we're already extracting and letting you use.

<Callout type="info">
Interested in the full list?

1. Skip this screen and build the pipeline. In later steps you’ll be able to see all the parameters you’ve got access to.
2. Look at the table below the form.
 - It pulls data from your most recent day of GA4 data that’s exported to show a sample of all events and properties.
 - It won’t show any properties that are already being used!
</Callout>


**Example:** 

If we have a `click` event which gets sent with the HTML ID as the property `click_html_id` we could select that as a piece of information we want to use later.

## What if I don’t need any extra information or I’m not sure what I need?

In that case you can ignore this screen and hit continue setup runs to unlock the rest of the setup options.

You can always come back!

![setup_run_extract](/images/resources/documentation/extracting_data/setup_run_extract.png)

## I do have custom properties I want to extract

### Option 1: You could fill in the form manually if you have the information ready

If you know the properties you can manually fill them in, but below the form we download a sample of events. 

### Option 2: We pull a table of example event properties you can use to fill in this form.

The data is from the most recent day of GA4 data and includes: 

- Every parameter for every event
- Two example values for each of those.

We then filter out all of the `event_properties` we’re already using and preparing for you, so you won’t see items such as `ga_session_id` , for example. We’re already making that available to you.

### Typically we’d recommend:

1. Look through the example values
2. Pick any parameters you are interested in
3. Review the form and hit save.

## Example 1: Adding extra data for a B2B form.

Let’s say you have a B2B website. 

You have tracked your own `custom_form_submission` event with a destination parameter

What we can do is unnest the `form_type` to extract whether the form submitted includes a telephone number.

### 1. Look through the example values for our parameter

Here it is!

![custom_form_submission](/images/resources/documentation/extracting_data/custom_form_submission.png)

### 2. Add the parameter

We’d just need to select `Add`. 

We only need to add the parameter once even if it appears in multiple events.

![custom_form_submission](/images/resources/documentation/extracting_data/custom_form_submission.png)

### 3. Review the form and save

This is what is pulled through to the form at the top.

![form_details](/images/resources/documentation/extracting_data/form_details.png)

## Concept: How does GA4 work with extra data?

GA4 fires events to track people on your site. 

Every event comes with lots of default event properties as well as the ability to add your own properties. This gives a huge amount of flexibility.

Most of the _add whatever you want_ happens in what GA4 calls `event_params`.

This is a field which allows you to save up to 25 different properties per event.

GA4 uses some of these by default. For example it sends some standard properties with most events. e.g.:

- `ga_session_id`
- `page_location`

Then some are specific to the event. e.g. the following are properties for video events like `video_progress`:

- `video_percent`
- `video_duration`

But an event could be literally anything so GA4 also says: add whatever you like. Let's have an example.

## Concept: Why might I want to extract additional data for some events?

We have FAQ's on our documentation pages. Suppose we start sending click events if you click on an accordion. We want to track the title of the accordion that was clicked.

We might get our developers to send an event like this:

<CodeMirrorDisplay>
```json
{
 "event": "accordion_click",
 "event_params": {
 "accordion_title": "What is Event?",
 "ga_session_id": "1234567890",
 "page_location": "https://www.example.com/faq/what-is-event"
 }
}
```
</CodeMirrorDisplay>

If we extracted this we could use this to make an extra session dimension of "reader expertise level". Someone who reads the “what-is-event” accordion is probably a beginner and this could help us segment our audience.

## Extract Data FAQs

<Accordion header="What if I’m not sure what I need to extract?">
 Do not worry. You can always come back to this and set it up.

This will be applied retrospectively as well.
</Accordion>

This page allows you to extract additional data. This page talks you through that process.

Setup: Extracting Extra Data

Extracting Extra Data

# Session Properties

This is one of the most powerful features. It's building custom dimensions and metrics for sessions and it lets you do some pretty cool things.

## Where is this in the Pipeline Interface?

![where_is_session_properties](/images/resources/documentation/session_properties/where_is_session_properties.png)

## What does this setting do?

This allows you to create your own custom session properties. 

These properties allow you to add an additional columns to your data, which could be: 

- A number or metric (i.e. the number of click events in the session).
- A dimension (i.e. the last page viewed before conversion in the session)

![example_session_dimension](/images/resources/documentation/session_properties/example_session_dimension.png)

## How do I do it?

1. Select a preset session property or add your own manually.
2. Do any custom configuration of the session property.
3. Hit Save

Let’s dive into it.

### 1. Select a preset Session Property or add your own manually.

The presets are at the top of the page are there as examples to build from.

Alternatively, you can create your own custom session property from scratch.

![preset_or_custom_session_property](/images/resources/documentation/session_properties/preset_or_custom_session_property.png)

### 2. Do any custom configuration of the session property.

Click on any of the properties at the top of the page to edit them.

![session_properties_stored](/images/resources/documentation/session_properties/session_properties_stored.png)

### 3. Hit save when you’re done customising.

Otherwise we’ll lose all your changes!

## This is kind of like conversion events!

It is very similar Conversion Events (if you’ve already set that up).

The big difference is that rather than flagging a single event, instead we look at multiple events across the session and create a property based off it.

The example below shows the decision flow for creating a session property to return a true/false if the session has had 5 or more pageviews.

![example_session_property](/images/resources/documentation/session_properties/example_session_property.png)

## Example:

### Example 1: Getting the last content page viewed in a session

The example here is looking at the last content (i.e. blog page) viewed in the session.

![session_property_example_2](/images/resources/documentation/session_properties/session_property_example_2.png)

**The conditions set here are:** 

- `dim__page_location REGEX CONTAINS /blog/`
- `Pick a column is dim__page_location`
- `Aggregation is last_value`

So when these conditions are met, the `last_content_page` column then contains the URL that matches these conditions. 

![last_page_viewed_in_content_example](/images/resources/documentation/session_properties/last_page_viewed_in_content_example.png)

### Example 2: Number of blog posts visited in a session

This example returns a true or false depending on whether the session has had more than 5 pageviews. 

This Session Property adds an additional column to the session table with either:

1 - If the session contained more than 5 pageviews. 

0 - If the session contained less than 5 pageviews.

Let’s see the setup in the interface

![session_property_example_3](/images/resources/documentation/session_properties/session_property_example_3.png)

**The conditions set here are:** 

- `event_name regex contains page_view`
- `Static Value = 1`
- `Aggregation is count`
- `The compare value is Greater than 5.`

So when these conditions are met, the `session_pageviews_count_gt_5` column then returns a 1 of the conditions are true, else a 0.

![session_prop_5_or_more_pageviews](/images/resources/documentation/session_properties/session_prop_5_or_more_pageviews.png)


## **Concept: What are Session Properties?**

Session properties are additional columns you can add to the GA4 output tables.

These are based on the type of actions or behaviours that have taken place during the session.

**Examples:**

- **“High Intent Sessions”** - i.e Sessions that include actions such as adding items to the cart, viewing pricing pages, or starting the checkout process.
- **“Last page visited before conversion”** - i.e. Which page did the user visit before converting.
- **“External link clicks visited”** - i.e. How many links to external sites were clicks.
- **“Number of blog posts visited”** - i.e. How many blog posts visits throughout the session
- **“Pricing page visits”** - i.e. How many visits to pricing pages has there been.

These can be tailored to what matters most to your business.

![example_session_dimension](/images/resources/documentation/session_properties/example_session_dimension.png)


## FAQs - Session Properties

<Accordion header="Do custom session properties add additional fields to your data?">
 Yes - custom sessions properties add an additional column to your tables which can be a metric or a dimension. 

What we’re **not** doing is filtering the data. That can be done on Looker etc.

If you want to filter to sessions that have less than 5 pageviews (based on a session property) you can filter on the column related to that session property in Looker Studio.
</Accordion>

<Accordion header="Do custom properties need to be dimensions?">
 No - these can be metrics or dimensions or boolean values (i.e. true and false).

This means that you can pull out specific dimensions such as Last Page Clicked.

You are also able to count other pieces of information such as:

- The number of blog pages visited throughout a session.
- The number of micro conversions throughout a session.
- Specific metrics such as average scroll depth, number of pages with x amount seconds interaction.
</Accordion>

Next up we're looking at creating session properties . This page talks you through that process.

Setup: Session Properties

Session Properties

# Spam Filter

The next step is filtering out spam traffic.

<Callout type="info">
 This step is optional. 
 
 By default, your traffic will not be filtered at all, but if you'd like to categorise any traffic as spam you can do it here.

 If you are unsure what case statements are you can read all about them in <OutboundLink href="/documentation/setup-instructions/useful-concepts/case-statement-editor/" children="this section." />

</Callout>

<TableOfContents toc={toc} />

## Where is this in the Pipeline Interface?

![where_can_I_access_spam_filter](/images/resources/documentation/spam_filter/where_can_I_access_spam_filter.png)

## What are you setting up here?

This page allows you to filter out spam traffic.

This works by filtering out traffic related to specific dimensions.

**Examples include:** 

- Referral traffic from specific domains.
- Traffic from specific locations (i.e. locations you know are sending bot traffic).
- Traffic from specific devices.

## How do I do it?

We use a case statement to categorise spam traffic as true.

![spam_filter_overview](/images/resources/documentation/spam_filter/spam_filter_overview.png)

You might recognise case statements from Looker Studio or perhaps you already know what one is.

Again, if you don't know what a case statement is, you can read more about them in <OutboundLink href="/documentation/setup-instructions/useful-concepts/case-statement-editor/" children="this section." />

Let’s go through some examples of how to adjust it. 

## Example 1: Flagging traffic from specific referrers as spam

In this example, we are categorising traffic from two specific domains as spam

<CodeMirrorDisplay>
```sql
case
 when regexp_contains(dim__page_referrer, ".*linkjuice.com.*|.*spamdomain.com.*")
 then true
 else false
end
```
</CodeMirrorDisplay>

This formula works as follows:

- Check if the referrer matches a regex pattern.
- The regex pattern is `.*linkjuice.com.*|.*spamdomain.com.*`
 - The pipe character `|` means "or".
 - The `.*` means match anything 
 - So we're essentially saying match anything which contains `linkjuice.com` or `spamdomain.com`.
- If it matches, then we set the value to true and we classify this traffic as spam.
- If it doesn't match, then we set the value to false and we classify this traffic as not spam.

## Example 2: Excluding traffic from specific locations as spam

In this example, we are categorising traffic from specific cities and regions as spam. 

<CodeMirrorDisplay>
```sql
case
 when regexp_contains(dim__geo_city, ".*(London|Sydney).*")
 then true
 when regexp_contains(dim__geo_region, ".*(Texas|Quebec).*")
 then true
 else false
end
```
</CodeMirrorDisplay>

We can, of course, combine these different rules on top of each other so we’d get a case statement like this:

<CodeMirrorDisplay>
```sql
case
 when regexp_contains(dim__page_referrer, ".*linkjuice.com.*|.*spamdomain.com.*")
 then true
 when regexp_contains(dim__geo_city, ".*(London|Sydney).*")
 then true
 when regexp_contains(dim__geo_region, ".*(Texas|Quebec).*")
 then true
 else false
end
```
</CodeMirrorDisplay>

## How do I know which traffic is spam?

The easiest way of spotting spam traffic is looking at unusual spikes in your traffic.

Once you’ve seen those spikes in traffic, you can then break down the traffic by different dimensions to look at the trends and see what is unusual.

In our experience common things to check are:

- **Location** 
 - i.e. A tonne of traffic from an un-expected country, city or town.
 - i.e. Location not set. E.g. country is set to "(not set)"
- **Browser Version** - i.e. An old version of Chrome. Bots are often on older versions of browsers.
- **No Source** - i.e. Traffic with no referrer. You can't use this by itself, but with other rules it can help.
- **Referrer** - i.e Classic referrer spam tries to fill your referring domains with spam domains i.e. [buylinks.com](http://buylinks.com)

Once you get to the bottom of this, you can add in your rules to the case statement.

## No screen resolution

Unfortunately not GA4 BigQuery doesn't export the screen resolution.

Hopefully they will in the future, but at the moment your best bet is if you've found it with screen resolution is to add other dimensions and check which matches.

# User Properties

Next up you are able to create custom dimensions and metric for users.

This is very similar logic to Session Properties, but for users.

## Where is this in the Pipeline Interface?

![where_is_user_properties](/images/resources/documentation/user_properties/where_is_user_properties.png)

## What does this setting do?

This allows you to create your own custom user properties. 

These properties allow you to add an additional columns to your data, which could be: 

- A number or metric (i.e. the number of click events by the users).
- A dimension (i.e. the last page viewed before converting by the user)

![custom_user_dimension](/images/resources/documentation/user_properties/custom_user_dimension.png)

## How do I do it?

1. Select a preset user property or add your own manually.
2. Do any custom configuration of the user property.
3. Hit Save

### 1. Select a preset User Property or add your own manually.

The presets are at the top of the page are there as examples to build from.

Alternatively, you can create your own custom user property from scratch.

![preset_or_custom](/images/resources/documentation/user_properties/preset_or_custom.png)

### 2. Do any custom configuration of the user property.

Click on any of the properties at the top of the page to edit them. 

### 3. Hit save when you’re done customising.

Otherwise we’ll lose all your changes!

## This is kind of like conversion events!

It is very similar Conversion Events (if you’ve already set that up).

The big difference is that rather than flagging a single event, instead we look at multiple events across the user and creating a property based off it.

The example below shows the decision flow for creating a user property for customers (true for customer, false if non-customer) based on if the user has made a purchase over the past 90 days.

![user_property_decision_tree](/images/resources/documentation/user_properties/example_user_property.png)

1. Remember to Hit Save when you’re done.

## Example: Let’s run through a few more examples.

### Example 1: Creating a user property for a “heavy user”

![heavy_user_example](/images/resources/documentation/user_properties/heavy_user_example.png)

**The conditions set here are:** 

- `Static Value = 1`
- `Aggregation is sum`
- `Window = 30`
- `The compare value is Greater than 8.`

![when_change_user_property](/images/resources/documentation/user_properties/when_change_user_property.png)




### Concept: What are User Properties?

Users properties are additional columns you can add to the GA4 output tables.

These are based on the type of actions or behaviours users have taken throughout a single or multiple sessions. 

**Examples:**

- **“Total Revenue Amount”** - i.e. The total revenue of all purchases made by the user.
- **“Average Session Duration”** - i.e. The average time the user spends on the website across all sessions.
- **“Total Number of Purchases”** - i.e. The total count of `purchase` events by the user.
- **“Subscription Status”** - i.e. Whether the user is currently a subscriber or not (this could be based on a visit to logged in URLs, for example).

![custom_user_dimension](/images/resources/documentation/user_properties/custom_user_dimension.png)

### Custom properties add additional fields to your data.

When creating these additional columns, we’re adding a column.

What we’re **not** doing is filtering the data. That can be done on Looker Studio etc.

### Custom properties can be both dimensions & metrics.

This means that you can pull out specific dimensions or write rules to create columns.

You are also able to count other pieces of information such as:

- The number of blog pages visited by the user.
- The number of sessions that user had.
- The total spend of the user.

Next up we're looking at creating user properties . This page talks you through that process.

Setup: User Properties

User Properties

# Adding GA4 to your pipeline

Now we’ve created a pipeline, we need to add GA4 to it.

<TableOfContents toc={toc} />

## Where is this in the Pipeline Interface?

At this stage, you should have already set up your Pipeline. 

The next step is adding GA4 to your pipeline. 

If you still have the checklist, then you can select this option

![add_a_ga4_property](/images/resources/documentation/adding_ga4_to_your_pipeline/add_a_ga4_property.png)

Alternatively, you need to go to the Side Navigation > List Pipelines > Select the pipeline you want.

![select_a_pipeline](/images/resources/documentation/adding_ga4_to_your_pipeline/select_a_pipeline.png)

Then we want to select add a new GA4 property.

![add_ga4_to_your_pipeline](/images/resources/documentation/adding_ga4_to_your_pipeline/add_ga4_to_your_pipeline.png)

This is also where you can edit the pipeline.

## How to run the basic GA4 set-up

### 1. Select the Google Cloud Project where GA4 is exporting to.

This will be wherever you’ve saved your raw GA4 data.

<Callout type="info">
 Tip: If you’re not sure where this is, you can go the <OutboundLink href="https://line.pipedout.com/ga4-bigquery-links" children="BigQuery Links" /> and select a property to see where it’s exporting to.
</Callout>

![g_cloud_project_id](/images/resources/documentation/adding_ga4_to_your_pipeline/g_cloud_project_id.png)

### 2. Select the Property ID (s)

Pipeline automatically scans for Property IDs which are exporting to BigQuery. 

Pick the GA4 property you want to add to the pipeline.

You can pick mulitple properties here: If you do we’ll roll them all up together into one big table. This won’t magically cause users to be tracked across multiple domains, it just means we’re putting all of these properties data in the same place to make for easier

<Callout type="info">

If you can’t find the properties you’re looking for here, this will likely be one of two issues: 

1. The data either is not exporting. (Check for it <OutboundLink href="https://line.pipedout.com/ga4-bigquery-links" children="here" />).
2. You’re missing the permissions you need. (Check that <OutboundLink href="https://line.pipedout.com/permissions/google-api/register" children="here" />).

If you can see a property ID, but no name it means you have permission for BigQuery, but not for the GA4 property.

- The pipeline will work in this scenario, but we would recommend getting permission for the project in GA4 to be able to use all our helper tools.

If none of those steps work, time to chat with support!

</Callout>

![property_ids](/images/resources/documentation/adding_ga4_to_your_pipeline/property_ids.png)

### 3. Select the timezone you GA4 data is stored in.

Select the timezone of the GA4 property you’ve selected.

<Callout type="info">

Note: If you have multiple GA4 properties in different timezones in the same pipeline there is no way for all of them to be correct. Consider making separate pipelines.

</Callout>

![timezone](/images/resources/documentation/adding_ga4_to_your_pipeline/timezone.png)

### 4. When setting up your pipeline you might want to only run a small period at first

The final option lets you run the pipeline for a small period of time rather than for all time.

![run_for_period](/images/resources/documentation/adding_ga4_to_your_pipeline/run_for_period.png)

It will make the pipeline run notably faster so if you’re experimenting with conversions, session segments etc. it is often a good idea.

When you are first setting up your Pipeline, we’d recommend setting this to a week. 

It’s the best way to iterate and make sure all your conversions, traffic channels are working as expected before running on all you GA4 data.

Then when you're happy, come back turn this setting off and re-run the pipeline to get full data.

<Callout type="warning">

**Warning:** GA4 uses last non direct attribution with a 90 day window. This means if you’re trying to compare BigQuery against the UI traffic sources, this will only be correct if you have processed 90 days worth of data in BigQuery before the date period you’re comparing. 

So if you wanted to compare 1st October 2024. You would need to have built from 3rd July 2024 until 1st October 2024.

If that's want you're trying to compare you'll need 90 days+ of data.

</Callout>



### 5. You’re all done. Hit save!

Please remember to hit save.

![last_step_save](/images/resources/documentation/adding_ga4_to_your_pipeline/last_step_save.png)

The next step is adding GA4 to your Pipeline. This page talks you through that process.

Setup: Adding GA4 to your Pipeline

Adding GA4 to your pipeline

# How does the GA4 pipeline get setup?

Once you’ve created your pipeline, the data doesn't appear immediately. 

There are two steps:

1. You need to run all the pipeline set-up steps.
2. (Optional) You can add any additional configuration you'd like.



<TableOfContents toc={toc} />

## What are set-up runs?

In order to provide feedback as you're setting up your pipeline (and avoid hitting errors further down the line), the GA4 set-up doesn't run in one big chunk. 

**The really basic version is**:
- Keep hitting "Start Set-up Run" until you get to the GA4 status: "Complete"

**The slightly more complex version is**:

- Each time you click "Start Set-up Run," the next stage of the pipeline will execute.
- Once the process reaches "Complete," your pipeline is fully configured and ready to use.
- Additional set-up steps will become available as your pipeline progresses through the set-up stages.
- If you make changes, such as adding a new conversion, the interface will prompt you to re-run specific parts of the set-up to reflect the updated data.

## Step 1: Back to our basic set-up

Once you have added GA4 to your pipeline, you need to do the setup run.

Hit Save (if you didn't do it yet) and then “Start Setup Run”

This first set-up run copies the data to the correct region.

<Callout type="info">
**How long does it take?**

Normally this will only take a few minutes.

But if you have a large historial set of data e.g. over a year, this can take some time (30-60 minutes).
</Callout>

You can check in on progress in the top right hand side of the screen.

![setup_run_1](/images/resources/documentation/how_does_ga4_pipeline_get_setup/setup_run_1.png)

## Step 2 - Now we've unlocked all the optional setup!
When the first start-up run is completed then you'll be able to do additional configuration.

You can see all that in the GA4 Optional Setup over on the left hand side.

![setup_run_2](/images/resources/documentation/how_does_ga4_pipeline_get_setup/config_unlocked.png)

## Step 3a - Continue configuring your pipeline

If you want to continue configuring your GA4 pipeline, then you can look through the optional setup options on the left.

And continue reading the optional setup docs which begins with [Extracting extra data for GA4](/documentation/setup-instructions/ga4-optional-setup/extracting-extra-data-for-ga4) page.

## Step 3b - Get started with your data

If you don't care about that and just want to get started then hit hit Save and then “Start Setup Run”.

You can always come back and change options later!

Then head to [accessing your data](/documentation/accessing-your-data/the-fundamentals/accessing-your-data/) to get started.

There are two key stages in getting the GA4 pipeline set up. This page talks you through that process.

Setup: How does the GA4 pipeline get setup?

Running the initial GA4 pipeline setup.

# Case Study Editor

The Pipeline Interface has a specific editor for categorisation using a Case Statement.

When working with GA4 data, we’re often going to want to classify things into buckets. 

For example: 

- Spam vs Not Spam
- Organic Search vs Paid Search vs Email etc.

The categorisation editor makes this easy and also very flexible depending on your needs.

## We classify things into buckets using Case Statements

You may have come across these in Looker Studio but just not known what they are called.

Case statements let you sort data into different categories based on certain conditions. 

You set the rules, and the case statement checks the data to see which rule it fits, then assigns it to the right category. 

### Let’s show a basic example

In this example, we’re categorising people based on their ages into either: 

- **Child** - If they are less than 13 years old.
- **Teenager** - If they between 13 and 19 years old.
- **Adult** - If they are between 20 and 64 years old.
- **Senior** - If they are greater than or equal to 65 years old.
- **Unknown** - If they match none of these.

The data in this example looks like this: 

| name | age |
| --- | --- |
| john | 12 |
| sally | 4 |
| mo | 78 |
| lenny | 21 |
| noel | 3 |

The column that we’re using for this categorisation is the age column.

<CodeMirrorDisplay>
```sql
CASE
 WHEN age < 13 THEN 'Child'
 WHEN age BETWEEN 13 AND 19 THEN 'Teenager'
 WHEN age BETWEEN 20 AND 64 THEN 'Adult'
 WHEN age >= 65 THEN 'Senior'
 ELSE 'Unknown'
END AS age_group
```
</CodeMirrorDisplay>

After the categorisation, this is how the table looks with the categorisation.


| name | age | age_group |
| --- | --- | --- |
| john | 12 | Child |
| sally | 4 | Child |
| mo | 78 | Senior |
| lenny | 21 | Adult |
| noel | 3 | Child |


## How do case statements work?

Case statements read from top to bottom. 

They check each condition one by one to see if it matches the rule. 

It goes through the conditions in order, stopping once it finds a match.

If no match is found, it falls back to a default option (which in our example above would be ‘Unknown’).

### The rules can be based on different criteria:

**1. Greater than (`>`) and Less than (`<`)**:
You can use operators such as `>` or `<` to compare values. 

Example: This Case Statement checks if `age` is greater than 18 to categorise as an "Adult," or 18 and below as a "Minor."

<CodeMirrorDisplay>
```
CASE
 WHEN age > 18 THEN 'Adult'
 WHEN age <= 18 THEN 'Minor'
END
```
</CodeMirrorDisplay>

**2. Equal to (`=`)**:
You can use `=` to check if something is exactly equal to a specific value.

Example: This Case Statement checks if the `birth_month` matches a specific month and then categorises based on that. 

<CodeMirrorDisplay>
```sql
CASE
 WHEN birth_month = 'January' THEN 'Born in Winter'
 WHEN birth_month = 'February' THEN 'Born in Winter'
 WHEN birth_month = 'December' THEN 'Born in Winter'
 WHEN birth_month = 'June' THEN 'Born in Summer'
 WHEN birth_month = 'July' THEN 'Born in Summer'
 WHEN birth_month = 'August' THEN 'Born in Summer'
 ELSE 'Born in Other Season'
END AS season
```
</CodeMirrorDisplay>

**3. Other Comparisons**: 
You can also use other operators like `>=` (greater than or equal to), `<=` (less than or equal to), and `<>` (not equal to).

Example: This Case Statement checks whether the score is greater than or equal to 50.

<CodeMirrorDisplay>
```sql
CASE
 WHEN score >= 50 THEN 'Pass'
 WHEN score < 50 THEN 'Fail'
END
```
</CodeMirrorDisplay>
**4. Using OR in case statements:**

The `OR` operator allows you to check if any of multiple conditions are true. This is useful when you want to match several different values or ranges.

Example: This Case Statement checks if `day` is either "Saturday" or "Sunday", it will return "Weekend". If neither, it defaults to "Weekday."

<CodeMirrorDisplay>
```sql
CASE
 WHEN day = 'Saturday' OR day = 'Sunday' THEN 'Weekend'
 ELSE 'Weekday'
END
```
</CodeMirrorDisplay>
**5. Using Regex (Regular Expressions):**

You can use regular expressions (regex) to match patterns in a string. 

Example: This Case Statement categorises Traffic into Default Channel Groups. 

<CodeMirrorDisplay>
```sql
CASE
 WHEN REGEXP_CONTAINS(traffic_source, r"facebook|instagram|twitter") THEN 'Social Media'
 WHEN REGEXP_CONTAINS(traffic_source, r"google|bing") THEN 'Search Engine'
 WHEN REGEXP_CONTAINS(traffic_source, r"email") THEN 'Email'
 ELSE 'Other'
END AS source_category
```
</CodeMirrorDisplay>

## How does Pipeline’s categorisation editor work?

When you’re editing a case statement we’ll show you a window which looks like this.

There are two important chunks of it.

![case_statement_1](/images/resources/documentation/case_statement_editor/screenshot_case_statement.png)

Then when you make changes you can:
- Validate the case statement
- Format the case statement

Validate makes sure you can't save anything which will break the case statement. Format just makes it easier to read.

![case_statement_avaliable_columns](/images/resources/documentation/case_statement_editor/case_statement_avaliable_columns.png)

### Will validate spot bad categorisations?

No.

It will tell you if the case statement is invalid but it won't tell you if you've made a mistake in your rules (e.g. classifying Tiktok as Organic Search).

![case_statement_validate_sql](/images/resources/documentation/case_statement_editor/case_statement_validate_sql.png)

<Accordion header="What if my own rules are classifying things incorrectly?">
 This will likely due to: 

a. The order of the case statement or 

b. Errors in the regex/ matching rules.

So let's take them one by one...

a. Case statements read from top to bottom. 
Ensure that you’re most important categorisations are listed top and then work down.
This way, they take priority over the others lower down.

![case_statement_logic](/images/resources/documentation/case_statement_editor/case_statement_logic.png)

b. It’s very easy to user a greater than rather than a less than or have typos in your regex matching. Go back through the rules you’ve set for things being miss-categorised and check those. 

</Accordion>

The case statement editor is used for categorisation. Let's break down how they work.