Using Google Analytics Settings to Properly Identify Pages

This year, I’ve been involved in many Google Analytics implementations and audits, and there has been a recurring theme around misunderstood GA Configuration Settings, mostly regarding how a page is identified. For instance, one recent client of mine had a 350-page site. But because of missed configuration settings, those 350 pages were showing up as literally 28000 URIs! Can you imagine pulling a report on any given page of that site? So to clear the air and hopefully save some GA users out there from future headaches, here are 3 quick ways to use GA Configuration to properly identify your pages:

1. DEFAULT PAGE DOES NOT MEAN “MY DEFAULTIEST PAGE”

The default page setting is used whenever a page URI ends in a trailing slash without specifying a file name- for instance, if you used this setting to specify that “index.html” is your default file name, “example.com/”and “example.com/index.html” would merge into just “example.com/index.html” in your content report, making analysis on that single page much easier.

Unfortunately, the name of the setting is misleading and tempts people into entering what they consider the “default page” of their site: their home page. But if you enter “http://www.example.com/index.html” as your default page, the real result would be that any page that ends in a trailing slash will have the full home page URL appended to it:

www.example.com/folder/

would become this in the reports:

www.example.com/folder/http://www.example.com/index.html

This is obviously not desirable, so please do not put your full home page URL as your “Default Page”. If you have a site that sometimes uses index.html or index.php, then you may want to specify THAT as your default page, so all pages with a trailing slash would consistently have index.html appended to them. Otherwise, leave the setting blank.

2. SO WHAT DO I DO ABOUT THOSE TRAILING SLASHES?

The default page setting cannot be used for what most people WANT to use it for- to standardize whether or not a page ends with a trailing slash. If you give in to the temptation to simply put a “/” in this setting, then “folder/” and “folder” wouldn’t merge together as desired- rather, “folder/” would become “folder//”, and “folder”would stay the same (remember, the setting only looks at which pages have a trailing slash, then appends the setting value to it).

If you would like to have all trailing slashes removed as the standard, so that example.com/folder/ and example.com/folder would appear as the same line item in the Content report- and who wouldn’t want that?- you will need to set up a custom filter that removes all trailing slashes:

GApages

Field A -> Extract A should be set to “^/(.*?)/+$”
Output To -> Constructor should be set to “/$A1”

Please note, much to my chagrin, such a filter would prevent your profile from being eligible for the not-filter-friendly Real Time Analytics(for now), but I promise this isn’t as big a deal as you might think it is, though I’ll save my reasoning for the unimportance of “real time” analytics for another blog post.

3. EXCLUDE PARAMETERS!

Most GA implementations I’ve seen have at least a few query string parameters excluded, but I don’t think I’ve seen anyone get it “just right” yet (admittedly, my level of nitpickiness may be a tad unrealistic). The problem with not excluding all non-content-identifying parameters is that parameters will cause one page to show up as separate items in the content report. For instance, if I want to report on how many page viewspromotions/springlanding.html got, I might need to pull the following 3 pages:

promotions/springlanding.html
promotions/springlanding.html?secured=true

AND

promotions/springlanding.html?type=4

Into my reports, to report on only one piece of content. This isn’t the end of the world; using filters in my reports I can usually get the info I need, though it does make trending harder. But it’s such an easy fix!

To see which query parameters might have escaped your settings, go to your Top Content report and do a search for “?”. If there are a variety of those pernicious params in there, you may want to use an advanced filter to filter them out one at a time, to be sure you’ve got them all. Now you have a handy list of parameters you can take to your configuration settings for exclusion. If you want to track one of the parameters, but not necessarily in your content report, don’t forget you can always use a Profile Filter if you want to extract a query parameter and put it into another field, like a user defined variable, or just clean up parameters in general.

Be careful to not exclude parameters that actually have importance in identifying content. For instance, a products page may have a ?sku=12345 that specifies which product is being viewed- this is a rather critical piece of information for some types of analysis, and should not be excluded.

Please be aware that users can add whatever parameters they want to your URLs, so you will never have full control here. Tools like Google Translate like to wreak havoc on URIs, but generally account for a very small percentage of page views.

Cleaning up your Content Report is an easy quickwin- it doesn’t take a lot of effort and can make analysis much easier. For questions about identifying content in Google or SiteCatalyst, contact me on twitter- @Jenn_Kunz.