Note: This article is intended for people who already use Google Analytics, but don’t know the magic behind the machine.
Few people who use Google Analytics are aware of the flaws and idiosyncrasies in the way it collects and processes data.
For example, if you’re a content marketer, you likely look at the “time on site” or “time on page” metrics frequently to assess whether your content is engaging or not. But did you know that the number Google Analytics reports isn’t accurate?
Here are 3 things you should know about this popular web analytics platform.
1. How Google Analytics collects and configures data
Once your data is collected, Google Analytics will process it based on your specific filters and settings. If you, for example, exclude a certain IP address from your data with a filter, the visits from that IP address are completely deleted. They’re not just hidden.
Finally, Google Analytics pulls together reports from the data it has collected.
Key takeaway #2: If you don’t install this code on every page, you’ll only get data for the pages it’s installed on. Make sure you install it on a template that propagates to your entire website. (Easy on WordPress, especially with a plugin).
Key takeaway #5: Make sure any filters that you set up are working properly, because otherwise you may be destroying your data. The Google Analytics Platform Principles course recommends having a backup view with all the data, a main view that you usually use, and a test view where you can test new filters.
2. The Google Analytics data model
Google Analytics tracks users, sessions, and interactions. Users are the individual people that access your website. If it’s their first time on your site, they’ll be counted as a new user. This tracking is imperfect—if they switch browsers or change computers they’ll still be considered a new user despite being the same person.
Sessions are the individual times they visit your site. A session lasts as long as the user is active on your site, or if their activity stops for 30 minutes or more. A user can have many sessions over time.
Interactions are things users do on your site. By default the only interactions that are really recorded are page visits. This results in one of Google Analytics’ biggest weaknesses: if someone leaves your page without clicking any more links, they are considered a bounce. You could have very engaged readers on your blog that open an individual post and read for an hour, but since they didn’t click anything else before they left, that data is lost and Google Analytics thinks they were there for 0 seconds.
Key takeaway #1: User data isn’t perfect. Think of each “user” as an individual computer and browser, not an actual person. A whole family could use the same computer.
Key takeaway #2: Session length is pretty inaccurate. As mentioned above, very engaged readers will be recorded as bounces in Google Analytics if they don’t click another link on your site after arriving.
Key takeaway #3: If your website contains content that you expect will take readers >30 mins to consume, change the default Google Analytics session length settings.
3. Google Analytics sampling
If you request a custom report that Google Analytics doesn’t provide by default, andyou have a large amount of data, Google Analytics may use sampling to answer your query. What does that mean?
Let’s say you want to check up on a specific email campaign that you sent in the last week. You open up the Source / Medium report, then add a secondary dimension of Campaign to find emails that were sent on a particular topic. By adding a secondary dimension, you are creating a custom report that isn’t available by default in Google Analytics.
If there is too much data for Google Analytics to sort through, it will grab a percentage of the sessions that match your query and display data based on those. Otherwise it would take a long time for it to process the data on the fly.
When this happens, a yellow tooltip like this will display:
For most use cases, this is fine. But if you need perfect accuracy, or if you want to grab individual UserIDs that match your query, this just won’t do.
Key takeaway #1: Buy Premium if you don’t want sampling in your life.
Key takeaway #2: Try narrowing the date range for your query. Limiting the data might help you avoid sampling.
Key takeaway #3: You can tell Google Analytics to use a larger sample size, but there’s a limit to how much it’ll use. And it will be much slower to load the report.
How to learn more
If you’re interested in more insights into how Google Analytics functions, you should consult the Digital Analytics Platform Principles course. It’s dry—the host, Justin Cutroni is a big fan of the hand steeple and looks visibly uncomfortable with the camera—but it’s useful information.