How can we reduce Search Console sampling and get the maximum possible data.

Search Console makes it really hard to get all your data out.

The interface gives you a tiny amount, Looker Studio gives you a bit more and the API a bit more on that.

But it samples a lot. That’s a real problem if you’ve got a big site and you want to extract the largest amount of data from it.

How do we get the most data from our site?

  1. Are you small? Looker Studio is by far the best way.
  2. Do you hit sampling in Looker Studio? Time for some clever API usage.

We'll talk you through how to do it.

Are you a video person?

If so we've got a video version of this post you can watch here.

What is the sampling limit?

The limit is 50k rows of data per property, per day, per search type (e.g. web, image etc), sorted by clicks. (i.e. you’ll get the most searched keywords first).

Everything past that is gone.

Also rarely searched keywords (which could identify someone personally) will also not be shown to prevent revealing identities.

Does this apply to you?

Are you reading this post and running a small site with a couple 1,000 clicks a month? This is almost certainly not a problem.

It can be hard to get all of the data out of the interface, but there's a nice easy fix.

You can just use Looker Studio (either in the interface or just make a table and export to Excel) and that will give you all of your data, no questions asked.

What does all of the data look like?

If you are a large site however the difference can be pretty substantial. Here are some numbers of unique queries we exported for a client in Jan 2024.

We end up 4-5x the amount of data that you get from Looker Studio or using the API in the normal fashion.

we extract 4x the number of queries in a single month

That’s volume of queries, but we also care about filling in as much data as possible.

This graph shows the total clicks registered in Search Console per day. The amount in green is the amount we get queries for.

But we’d also really like to get more information on the orange chunk.

we also want to reduce the cardinality for other dimensions which arent query

We can’t get queries for this, it’s all part of the long long tail and the bigger your website the bigger this chunk is.

But we can still get more accuracy for the other dimensions. By carefully downloading data we can get a more accurate clicks number for an individual page.

we get about 1.4x clicks registered to a URL by carefully downloading data

This is really important for two reasons:

More dimensions: We can get country/device/page for it and if you’re looking to join Search Console to GA4 (and even adwords), then this is crucial for accuracy.

We want our numbers to line up with the UI: Nothing ruins people’s trust that looking at a number and realising its 40% below what you can see in the UI. Even though it is actually true.

How to check if you have missing data?

1. Register a subfolder for your main domain in Search Console

Add a subfolder and your main domain to Looker Studio.

2. Make a Looker Studio report and add both of them as sources

adding both profiles as sources in looker studio

3. Make two tables and distinct count the number of folders in each

For your main site profile, filter it to the same subfolder as the one we’ve registered.

comparison showing a different number of queries

As we can see in this case our main site can see 202 queries in the subfolder, our sub folder can see ~15,000 queries.

How do you get maximum data from Search Console?

What are the big components?

These days there are practically two three pieces:

  1. Register as many subfolders as you can.
  2. Download data day by day
  3. Tie together the different folders

And an optional 4th one (that we’d highly recommend):

  1. Explicitly fill in cardinality

1. Register all of the subfolders for high traffic sections of your site

Hopefully you’ve already got some of these, but if you don’t you’re going to want to register them.

You’re going to want to register all the subfolders you can for high traffic sections.

For our hypothetical e-commerce site this might look like:

And so on. You only need to do this for subfolders which actually get traffic you care about. e.g. for most sites you don’t need to worry about registering your internal user account folder e.g.

These only start collecting from whenever you’ve registered them, so the sooner the better.

2. Download the data for each folder day by day.

Time to use the the API to pull the data down day by day.

The limit is day by day, but this usually minimizes API timeouts and retries.

This isn't a full technical walkthrough so at this point we'll point you over to this library for the search console API in python.

If you're not technical then at this point you're probably going to want to pull in your tech team or freelancer.

3. Tie the subfolders together.

The top level properties e.g.

Will contain the subfolders:

So you’ll need to filter out those segments of traffic and stitch in your subfolders.

Also remember those subfolders only count from the day they’ve been registered, so you can’t backdate a new subfolder!

4. Filling in cardinality

But even doing all that, we might still have a gap of missing traffic.

Can we do any better than this?

We can’t get any additional queries. If that’s all you care about then we’ve maxed that out.

But if you do want to:

  • Join SC with GA4 to revenue/conv per query.
    • (And tie in with Adwords).
  • Have your downloaded data match the interface
    • So internal teams stumbling into the interface don’t start panicking

We can do a bit more. We can fill in this cardinality.

how can we fill in the cardinality.

Here’s the top level example:

  • The top level number of clicks we have for device, page, country + query is 1000.
  • The top level number of clicks we have for country + device is 2000.

We can then see our top level missing data is 1000 clicks.

We go and fetch for different combinations of dimensions. Page and country + device. Where we don’t have a value we just set “(other)”.

We repeat this process for each subfolder, stitch it all together and this gives us numbers that both:

  • Add up to the correct totals
  • Have as much detail as possible.

5. Bonus set-up and join onto the BigQuery export

We can then join this onto the BigQuery export. This doesn't backfill and we'll have a post on setting up this soon, but it's a great thing to do going forward to keep getting maximum data.

That’s all for today

Hopefully that's given you a top level overview of how to get the largest possible chunk of data from your Search Console.

At the very least register your subfolders and you can always think about the other points in the future.

If you're a bit more ambitious set-up the BigQuery export.

And if you’d like to have us do this for you, please get in touch we can help set you up with a marketing/SEO data warehouse with all the possible Search Console data backfilled as well as joining it with other sources like GA4 and Adwords.

Dom Woodman Headshot
By Dominic Woodman -- I'm the founder of Piped Out and have never loved 3rd person bio's.

If you'd like a demo/quote, please contact us!

Book a demo