Making BigQuery a Single Source of Truth for Marketing Data Analysis

Single source of truth for marketing data

Is your marketing data currently under the control of multiple vendors and platforms? Do you have to mix together siloed data and reporting tools to answer questions about marketing ROI and your customer’s journey? It’s time to take control of your data!

Cloud databases (i.e. BigQuery) and various data management tools have made it possible for marketers to build data pipelines without a big investment in hardware, software, and custom development.

In this article, we will walk through the steps of gathering data from Google Analytics, Google Ads, Google Search Console, Facebook, CRM and several other sources into Google BigQuery data warehouse and making it the single source of truth for all marketing data.


  • Learn why most businesses need a marketing data warehouse
  • Learn how to plan the best marketing data warehouse for your business
  • Learn the basics of working with marketing data stored in BigQuery

Why (almost) every business needs a marketing data warehouse?

In the past few years, the number of marketing tools an average marketing person uses has grown rapidly. Doesn’t matter whether it’s a SaaS, lead gen or an e-commerce business, besides one or two analytics platforms there usually are a few ads platforms in play, a CRM, CMS, several social media platforms, an email automation system and probably a few more tools and platforms.

All of those tools and platforms are there to make our work as marketers, business owners or data analysts easier and more effective. In reality, though, you will end up having a bunch of silos – systems that don’t really communicate with each other and rarely agree on any of the important KPIs or metrics.

Data silos are the source of confusion and disagreement between teams within a business, leading to a situation where, at the end of the day, no-one knows which tool or numbers they can trust.

The solution? A marketing data warehouse.

Below are a few reasons that should explain quite vividly that your business needs a marketing data warehouse.

– Single source of truth: the main benefit you get from having all of your marketing data in one central data warehouse is that every team in your business will look at the same metrics, calculated the same way. This benefit alone will save you and your colleagues from lots of misunderstanding and possible conflicts between teams and functions.

– Access to raw data: most marketing tools show you aggregated data and calculated metrics. While it’s okay for a more basic user, it is definitely limiting your ability to do your own calculations. It is also the main reason why data from different tools never match – they simply measure and calculate the core metrics like users, sessions, conversion rate, revenue etc. in a different way. When working with raw data, you define your own aggregation and calculation rules – this should be agreed on company level and leads to everyone looking at the same numbers.

– Visualization and BI tools: sadly, not all marketing tools have a connector with most major data visualization and BI tools (Tableau, Looker, Google Data Studio etc.). Working with CSV or Excel files isn’t automatic or very scalable. Data warehouses like BigQuery have a native integration with nearly all better-known BI and data visualization tools.

– More accurate attribution: tools like Google/Adobe Analytics are great when you’re just getting started with marketing channel attribution. What most tools lack, though, are access to all user touch-points (online and offline) and the ability to use custom attribution models (FBA, ML-based, Markov etc.).

– Machine learning: with tools like BigQuery ML, Auto ML and others being widely available, getting started with machine learning has never been easier or more affordable. The only part that’s missing, though, is access to high-quality raw hit-level data that is required for training your models. With all your marketing data in your data warehouse, you can create models from product recommendations to smarter remarketing campaigns.

– Manual data joining doesn’t scale: exporting data from various sources and joining them in Excel/Sheets can give you interesting insights you wouldn’t have seen in any single tool but let’s be honest, this system isn’t very scalable. Working with a data warehouse, you can pre-join commonly combined data into views or pull everything together in a dashboard.

– A complete overview of the user journey: no one tool knows everything about your users thus looking at them doesn’t show you the complete user journey. Combining data from all tools and sources, both online and offline gives you a much more detailed picture of the entire user journey. Furthermore, this gives you much more accurate long term metrics like users’ lifetime value.


I could go on for long writing about the benefits that explain having a proper marketing data warehouse. What about you? Let us know of the reasons your business needs (or already has) a marketing data warehouse in the comments below.

How to plan and get the best marketing data warehouse for your business?

Having a marketing data warehouse is becoming the new norm. Setting one up, though, is not a trivial task.

There are some tools that offer sort of a plug-and-play solution but just like every business is different, so are their requirements for a marketing data warehouse. There is no good solution that would meet the requirements of all (or even most) businesses.

Mapping data sources

Before dealing with anything else, start by mapping all the data sources your business has. This may include, but isn’t limited to, Google Analytics, CRM, CMS, offline data, Ads platforms, email automation tools, SMS and push notification platforms etc.

After knowing your sources, try to group all of them into a document and sort them in a logical order. For example, (1) analytics platforms, (2) social media platforms, (3) email etc.

Next, think more deeply about each of the listed data sources. Take a look at their documentation and data structure. In your document, take note of what kind of data and in what format you’d need in your marketing data warehouse. Start by thinking of how your team is using this data today, what kind of dashboards you’ve built etc.

Furthermore, take note of how fresh each dataset should be in the data warehouse. Do you need it in real-time, hourly, or maybe a daily update is fine? Probably you’d want your click-stream data to be as fresh as possible while having your ad cost data update once a day may be fine.

As an extra, I recommend you check if each data source has a public API or if they allow automated data export. If that’s not the case, it’s highly likely you can’t easily export data from that tool and I’d look for an alternative for that tool. Every decent marketing tool should have a public API for exporting data these days.

Mapping use cases

Based on the list of data sources you just listed, it is now time to think about all the great things you could do with a marketing data warehouse that includes all the data.

Once again, I recommend you start a document and structure it logically based on some category. For example, (1) user behavior analysis, (2) attribution, (3) alerts etc.

Of course, you don’t have to think about everything beforehand but it helps you later on deciding the right schema, load frequency (batch load vs stream), pre-calculated views, metrics etc.

Start by thinking of all the ways you are currently using the data available within each marketing tool your company has. Next, think of ways you’ve joined the data in the past using tools like Excel or Google Sheets. Then, I recommend, you go completely creative. Think of things you were never able to get from your data but would really like to. For example, analyzing user journey across all touchpoints, online and offline.

Build or buy?

The truth is, that building a data warehouse has never been easier nor cheaper. Every cloud platform has its own offering and several solutions have been built on top of major cloud platforms by other companies.

To be honest, setting up a data warehouse is actually the easy part. What makes it difficult though, is getting data in the data warehouse – automatically, reliably, fast and free of duplicate records. This part is known as a data pipeline and this is the system that feeds your data warehouse and, ultimately, your business with trustworthy data.

Now, where do you get a solid data pipeline? You have three options.

1. Build: if your company has enough resources (money, time, developers, data scientists, data engineers) then building everything in-house might be a good idea. There are some useful building blocks available in most cloud platforms (i.e. Cloud Dataflow, Pub/Sub, Cloud Functions in GCP). What I would recommend, though, is to make sure you have at least one person in the team that has led a data pipeline build before or at least someone you could consult with. Otherwise, there are things that will probably go wrong in your first attempt.

2. Ready-made solutions: depending on the level of customization you require, some of the ready-made data pipeline solutions might fit most of your needs. Some tools to check out are Stitch, Fivetran, Funnel and Segment. All of them do a great job at what they’re built for – offering fixed or semi-flexible data pipeline to feed your data warehouse with data. Unfortunately, though, each of them has their shortcomings. For example, none of them is able to send raw hit-level Google Analytics data into your data warehouse and process it into sessions based on your rules. Oh, and setting such a tool up can still be a considerable amount of work.

3. Data Pipeline as a Service: to gain full control of your marketing data, you need a system that’s tailored to your business’ needs. A way to get there is to hire a company that has all the necessary infrastructure (connectors, integrations, batch loaders, monitoring systems etc.) and is willing to take a personal approach with every client. This means getting to know your business, interviewing your team and building a solid data pipeline using the building blocks they already have – and code everything they don’t.


Start by mapping your data sources and use cases as described above. Then see if any of the ready-made solutions could fit your needs. If you need a more customized/advanced solution, think about your internal resources. If you need help with deciding which way to go or want to learn more about data pipeline as a service, talk to experts at RD.

The basics of working with marketing data stored in BigQuery

As BigQuery is one of the most popular data warehousing solutions for marketing data, let’s use it in our examples here as well.

Having all of your marketing data available in BigQuery is great but quite useless if don’t use it or if you only use it for things that are easily doable within Google Analytics or some other tool. E.g. for checking how many sessions your website received, the bounce rate etc.

You have to use SQL

Coming from a tool that lets you build reports by dragging and dropping various metrics, dimensions and filters in the editor, getting started with BigQuery can look like a bit of a learning curve.

When going to BigQuery, this is literally all your welcomed with.

BigQuery home screen
BigQuery home screen

No fancy data explorer features or a drag and drop report builder. Just a blank white text input area for your SQL queries.

Yes, there are tools like Looker and Google Data Studio that connect with BigQuery and automatically understand your schema. Using these tools with little or no SQL skills, though, you’re likely better off using the reporting features in your data sources (Google Analytics, CRM, Ads) instead.

The power of BigQuery becomes apparent only through (quite advanced) SQL queries. This is what allows you to join various sources, calculate your own metrics, define your own attribution models and so much more.

As a marketer, you really have two options: (1) learn SQL (here’s a super good getting started guide from Mode) or (2) work closely with a developer/data scientist/analyst that knows SQL really well.

No matter which route you take, know that SQL is no rocket science but there is no way around it when getting into more advanced data analysis.

Now, let’s take a look at a few use cases for a marketing data pipeline a.k.a your single source of truth.

Analyzing the journey of a single user

In this example, we are using raw hit-level Google Analytics data sent to BigQuery using RD Marketing Data Pipeline.

What’s good about BigQuery is that, unlike Google Analytics, you can use it to work with personally identifiable information (PII) – for example, our dataset includes users’ email addresses. This makes it easy to connect incoming leads with their activity on the site.

Let’s start by looking at how many sessions our lead has had on our website in the selected time frame. For this example, we’re connecting all sessions using email address, IP address and the client ID.

Count of users sessions in BigQuery
Count of users sessions in BigQuery

Seems like this user has visited the site a whopping 151 times (the number of sessions)!

Next, you might be interested in where all those sessions initiated. You could do so with a query like this.

Count of users sessions by source/medium in BigQuery
Count of users sessions by source/medium in BigQuery

While this gives you decent overview of how they found your site, you should also look into the landing pages (especially for organic, to understand what they’re looking for) and all the pages they’ve visited prior to submitting the contact form.

At Reflective Data, we do a similar analysis with all leads before we contact them – just to understand how much they already know and what they might be looking for.

Custom attribution models

Most tools out there default to one specific attribution model. For example, Google Analytics uses the last non-direct click model with a 6-month look-back window. Some tools let you modify the look-back window and/or compare different models but usually, the options are quite limited.

An attribution model is a really important part of marketing analysis. Different models can yield completely different results. Picking or building an attribution model should be a well-reasoned decision that’s based on the actual behavior of your customers, buying cycle, types of campaigns etc.

In the following example, we are using the same dataset as in the previous example. The default attribution model used there is “last interaction” but we’d like to compare it against the more commonly used “last non-direct model”.

First, let’s see the numbers using the “last interaction” attribution model.

Count of sessions by source/medium in BigQuery (last interaction)
Count of sessions by source/medium in BigQuery (last interaction)

And now let’s see the numbers using the “last non-direct click” attribution model.

The query here is a bit bulkier but it’s actually quite simple and logical when you take a closer look.

Count of sessions by source/medium in BigQuery (last non-direct click)
Count of sessions by source/medium in BigQuery (last non-direct click)

While the difference isn’t massive for this dataset, we can see that there’s a lot less direct/none traffic using the “last non-direct click” model – which makes total sense, because all direct clicks that came after a non-direct click were attributed to that other source.


We aren’t going in too much detail with all the attribution models you could build using the raw data in BigQuery but let me say that lately, many of our clients have requested us to build them all sorts of custom attribution models like linear and funnel-based or Markov, and even some machine-learning-based models.

Various data sources

While Google Analytics raw hit-level data (from GA 360 or using the parallel tracker) tends to be the most commonly used dataset in BigQuery or any other marketing data warehouse, you can (and should) include data from all marketing tools and platforms. This allows you to join data from various sources and spot insights and connections that weren’t possible before.

For example, one might capture a daily snapshot of their Twitter account in BigQuery and run a quick query to see the data for a current month.

Twitter account data in BigQuery
Twitter account data in BigQuery

The data above would fit perfectly in a social media marketing dashboard.

In another example, let’s query a table containing Mailchimp data to see how many new subscribers were acquired per day.

Mailchimp subscribers in BigQuery
Mailchimp subscribers in BigQuery

As you’ve probably realized, you can (and should) send data from all marketing tools into one unified data warehouse. As BigQuery and most other solutions are available at a rather reasonable pricing point, I would recommend collecting data from all sources, even ones you don’t plan on using in the nearest future because it’s always good to have access to historical data for a more advanced ad-hoc analysis.

Moving forward

If your company already has a marketing data warehouse that’s built on a modern cloud infrastructure and contains data from all the marketing tools that your business is using – good job! You’re working at a good, forward-thinking company. Just make sure this data is actually put in use as well.

If that’s not the case and your company doesn’t have a proper marketing data warehouse in place, start by thinking of all the marketing (and other relevant) tools that are producing data. You’ll probably realize that what you have is a bunch of silos, limiting the efficiency of your teams and holding your business back from being a truly data-driven one.

Whether you’re planning on building the data pipeline that feeds your data warehouse with high-quality data yourself or to buy it in as a service, it never hurts to consult with the experts that have been building data management systems for years.

If you have any questions, feel free to shoot them in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign up for Reflective Data

5,000 sessions / month ×

Please enter your email


After submitting your email, we will send you the registration form.