Spam Filter

The next step is filtering out spam traffic.

This step is optional.

By default, your traffic will not be filtered at all, but if you'd like to categorise any traffic as spam you can do it here.

If you are unsure what case statements are you can read all about them in this section.

Where is this in the Pipeline Interface?

where_can_I_access_spam_filter

What are you setting up here?

This page allows you to filter out spam traffic.

This works by filtering out traffic related to specific dimensions.

Examples include:

  • Referral traffic from specific domains.
  • Traffic from specific locations (i.e. locations you know are sending bot traffic).
  • Traffic from specific devices.

How do I do it?

We use a case statement to categorise spam traffic as true.

spam_filter_overview

You might recognise case statements from Looker Studio or perhaps you already know what one is.

Again, if you don't know what a case statement is, you can read more about them in this section.

Let’s go through some examples of how to adjust it.

Example 1: Flagging traffic from specific referrers as spam

In this example, we are categorising traffic from two specific domains as spam

This formula works as follows:

  • Check if the referrer matches a regex pattern.
  • The regex pattern is .*linkjuice.com.*|.*spamdomain.com.*
    • The pipe character | means "or".
    • The .* means match anything
    • So we're essentially saying match anything which contains linkjuice.com or spamdomain.com.
  • If it matches, then we set the value to true and we classify this traffic as spam.
  • If it doesn't match, then we set the value to false and we classify this traffic as not spam.

Example 2: Excluding traffic from specific locations as spam

In this example, we are categorising traffic from specific cities and regions as spam.

We can, of course, combine these different rules on top of each other so we’d get a case statement like this:

How do I know which traffic is spam?

The easiest way of spotting spam traffic is looking at unusual spikes in your traffic.

Once you’ve seen those spikes in traffic, you can then break down the traffic by different dimensions to look at the trends and see what is unusual.

In our experience common things to check are:

  • Location
    • i.e. A tonne of traffic from an un-expected country, city or town.
    • i.e. Location not set. E.g. country is set to "(not set)"
  • Browser Version - i.e. An old version of Chrome. Bots are often on older versions of browsers.
  • No Source - i.e. Traffic with no referrer. You can't use this by itself, but with other rules it can help.
  • Referrer - i.e Classic referrer spam tries to fill your referring domains with spam domains i.e. buylinks.com

Once you get to the bottom of this, you can add in your rules to the case statement.

No screen resolution

Unfortunately not GA4 BigQuery doesn't export the screen resolution.

Hopefully they will in the future, but at the moment your best bet is if you've found it with screen resolution is to add other dimensions and check which matches.