Testing the Waters: Key Questions to Ask Before Jumping into a Data Lake


While most marketers are aware of the important role data plays in their strategies, the real challenge in today’s omni-channel consumer environment is making sense of all the disparate data points from both online and in-store interactions.

IDC estimates that in 2025, the world will create and replicate 163 trillion gigabytes of data. This is a tenfold increase from the amount of data created in 2016.

As a result, demand is increasing for ways to maximize insights. Marketers have already shown interest in data management tools; according to AdExchanger, 80% currently use or plan to implement data management tools by 2018. This has led to the rise of data platforms such as Data Warehouses, data management platforms (DMPs) and most recently, Data Lakes.

Many people think Data Lakes are a resurgence of Data Warehouses, or for advertisers, a DMP, but the only trait Data Lakes share with DMPs and Data Warehouses is that they store and organize data. Data Lakes do much more than that, and thus are emerging as a powerful solution for brands.

A Data Lake is a centralized platform that can store and process massive amounts of structured and unstructured data through a flexible, cloud-based platform. By storing data this way, companies have greater flexibility as they can run queries only when the data is needed for analysis. Additionally, it allows brands to more seamlessly link together raw and structured data from multiple sources and channels for more comprehensive reporting.

The challenge for most marketers in approaching a Data Lake as a potential solution is not just understanding what it is, but knowing if it’s right for your company. You should consider what your goals are, the capabilities that are important to you, and what resources you have to exploit the power of the platform.

Before you dive in, take a look at a few key questions you should ask yourself when considering a Data Lake platform:

1. What Resources Are Available to You?

The central resource you will need is people, specifically data scientists. A Data Lake typically requires that your organization has data scientists to query and pull data from the platform in order to provide actionable insights. If your organization doesn’t have the people resources necessary to support a Data Lake, it doesn’t mean implementing a Data Lake platform is out of reach.

Though not as common, some Data Lake platforms allow users to create workflow templates. Similar to Data Warehouses, users can create workflow templates that enable other functions of your organization to take advantage of your data just by changing a simple command within the query. This will either free up data scientist resources or require less maintenance resources.

2. What Kind of Access Do You Have to Customer Data?

Your customer database (CRM) is an important piece of the puzzle. This information acts as the anchor allowing other data points like email, IP address, device ID and purchase behaviors to be deterministically tied to real people. In a Data Lake, this is the crucial link that enables marketers to tie disparate data sources together without the need for probabilistic algorithms.

One of the most important customer data points is email, which enables brands to bridge the offline world with digital. If you do not have access to emails or other registration data such as mailing addresses, it is an important consideration when selecting your Data Lake platform, as these links unlock insights into how campaigns drove sales (both online and offline). We’ll explain more on this in the next question.

3. Does the Platform Have Data and Device Graphs Available for Purchase?

Many Data Lakes do not have integrated device graphs with first party data on actual consumers. If you are a brand that does not collect data like email addresses from your customers, it is imperative that the platform you select has an integrated device graph, first party data, and third party data that you can tap into. With these added layers, you can deterministically sync data between offline and online channels. Even for brands that collect email, platforms with first and third party data sets can expand the amount of connections you make, enabling you to unlock the full potential of your data.

A Data Lake platform with first-party data and data partnerships can provide you with people-based insights into a consumer’s retail purchasing habits, media consumption, social actitiviites, and more. For example, an automaker could identify in-market shoppers and measure the impact TV or digital ads had on dealership foot traffic and daily sales.

Being able to use both online and offline data for targeting, as well as accurately measuring the complete customer journey from the first touchpoint to conversion (either online or in-store), is necessary in today’s omni-channel world. To accomplish this requires immense computing power and inexpensive storage for the massive amount of data that is being collected.

This is where a Data Lake really shines. It helps marketers take data from multiple first, second, and third party sources, and house it in one centralized cloud-based platform for immediate analysis and activation.

For more information on Data Lakes, read Viant’s explainer guide, Demystifying Data Lakes.

  • #CRM
  • #data lake
  • #data management platform
  • #data warehouse
  • #omnichannel
  • #omnichannel marketing
Up Next

Three Questions to Ask Your TV Measurement Provider

With all of the noise in the industry on accurate TV measurement, how can you tell the real players from the pretenders? Viant’s SVP of Advertiser Products offers a three-part guide to selecting a TV ACR partner.

Read More