If you are like me, you probably see unnatural traffic all of the time sneaking into your Google Analytics data. It is a constant battle, and you sometimes feel like you will never get the clean data you deserve.
Not all bot traffic is created equal though, and the measures necessary to remove this data from your Google Analytics moving forward is not always straightforward. Recently we were reminded of some influx of bot traffic to some of our sites we manage coming in with a hostname of (Not Set).
Unfortunately, in these cases, we were not able to filter out these sessions by simply setting up a hostname filter. For whatever reason, Google looks at a null value as nothing more than that and will not use the filter for hostname as a result. Setting a filter to only allow traffic from a specific domain did keep out bad traffic from spoofed domains, but the traffic from Hostname = (not set) was still present.
What we needed to figure out was a way to filter sessions that had values of null for the hostname without an include only filter.
Filtering Out (not set) hostname traffic in Google Analytics – The Solution
After some research, we found a way to set a variable using Google Analytics filters that can be used in other filters. The idea is, that we read in the hostname value and save it to a dynamic variable using filters. In a separate filter, we access that value and check it against the real hostname. In this case, the variable is set to nothing and not null, so the filter check against this new hostname variable works as opposed to when the value is null.
In order to set this up, you will need to create two filters. The first setting the value of the hostname to a custom variable, and the second filtering out non-hostname matches.
Creating a Hostname Identifier Field
- Create a new filter in Google Analytics. Name it “Create Hostname Identifier Field” or something similar.
- Choose “Custom” Filter Type.
- Select “Advanced” as the filter type option.
- In Field A → Extract A choose “Hostname” as the option, and put “(.*)” as the value for the field.
- Leave Field B → Extract B blank.
- In Output To → Constructor choose “Custom Field 1” as the option, and put “$A1” as the value for the field.
- Make sure “Field A Required” is checked as well as “Override Output Field” is checked. The other two options can be left unchecked.
- Save Your Filter. We will use the value saved in “Custom Field 1” as our check against the real hostname.
Creating the Valid Hostname Filter
- Create a new filter in Google Analytics. Name it “Valid Hostname Filter” or something similar.
- Choose “Custom” Filter Type.
- Select “Include” as the filter type option.
- Under “Filter Field” select “Custom Field 1” as the option.
- In the “Filter Pattern” field you want to input a regex confirmation of the hostname or hostnames used for this profile.
- The most basic filter pattern would be “websiteaddress\.com”. Make sure to include the “\” character to make sure the “.” is escaped properly.
- Save Your Filter. This will serve as your new hostname filter check.
After this is all set up, you will no longer see traffic with the Hostname set as (not set) or being null. You should only see traffic that was originated on your domain (hostname).