As companies work to become more data-driven, they often start by assessing the state of their data. Many questions will come to mind, but they may not be the right ones.
Commonly-asked questions include:
- Do I have gaps in my data?
- How good is my data?
- Is my data clean enough?
However, none of these questions make sense unless you ask “for what?”. Your business has strategic processes or objectives that depend on data; the relevant data must be evaluated in this context. For example, data can be used to attract new customers, target selected customers, or to inform high-volume, important decisions.
More appropriate questions are:
- Do I have gaps in my data? … for understanding customer purchase behavior
- How good is my data? … for predicting quarterly sales
- Is my data clean enough? … for automating production
Understanding the state of your data is a function of what you are trying to do, and thus you must consider your data through the lens of your business objectives. For example, data that isn’t clean enough to drive financial forecasting could still be incredibly useful to drive a differentiating user experience. Put simply, it is important to understand what you want to accomplish as a business and assess your data against those goals.
In this post, we will discuss what “real” gaps in data look like and how to find them in your organization. In a future post, we’ll take a look at communicating what you’ve found, how to prioritize what to tackle first, and work through an end-to-end example.
Looking for gaps?
To begin looking for gaps, you must trace back from your business objectives to the data.
Let’s look at an example. Consider a medium-sized bank that is trying to reduce the money lost to fraud by 10%. One approach is to detect and prevent account takeovers, but does the bank have the data that could make this possible? When IT starts making modifications to their online application to detect fraudsters, they may find out too late that the application is based on a nightly batch process; they lack the real-time infrastructure and, therefore, the relevant data needed to identify and act on account takeovers as attempts occur. Inadequate data that is not available for analysis quickly enough is a clear gap that is holding them back.
Mapping data to business objectives
Looking at functional requirements and technical use cases can help break down complex business objectives to understand the role data plays in their execution. Functional requirements describe what capabilities need to be built in order to satisfy a particular business objective. Use cases then identify the technical steps required to fulfill those requirements and that allows us to get to the data.
When writing these use cases, look at them from an engineering or data science point of view. The easiest way to do this is to think of the different stages of the Data Value Chain, and what is needed with respect to the pre-, during, and post- stages to support your goal. For example, if the high-level, functional requirement is “validate authenticity of a user,” some of the use cases might include the following:
- acquire and load customer, device, and transaction history data
- integrate historic data from different sources with active session data
- perform exploratory data analysis to produce insights on user authenticity, which could include network or linkage analysis and mapping of user to specific behavioral archetypes
- engineer features that describe the user for modeling and store this data
- develop model that scores the transaction for likelihood of fraud
- expose fraud prediction model outputs as triggers within the online application to initiate additional security challenges when applicable
As you go through this decomposition for each business objective, ask yourself “What data is necessary?” for the respective use cases. You want to be able to explicitly identify the data categories that relate to each technical use case and map back to each business objective. This establishes the framework for your data gap assessment. The figure below shows this flow of thinking, from the business objectives to the data and how it will be used.
Now that you know where to search for data gaps and have worked through identifying the technical use cases, the next step is to determine if your data is holding you back. Do you have gaps?
What do data gaps look like?
The obvious sort of gap is that the data you need is missing. However, that is not the only way data can hold you back. Data “coverage” is just one dimension to be considered in evaluating your data. There are other dimensions that can also be reflective of underlying technical capabilities that are not working as well as they could be or are missing altogether.
If you’re performing the gap assessment for a technical audience—say a data architect or set of project managers—it makes sense to focus on dimensions that are highly technical, like breadth, depth, frequency, and latency. These stakeholders will be concerned with understanding whether existing approaches to data collection or data storage are fulfilling the needs of the business. For a higher-level audience—perhaps a new CDO—focusing on high-level dimensions like data coverage, accessibility, or ease of use is more likely to inform the kinds of decisions they’ll want to make from the assessment. They’ll be grappling with questions like: Do we have the data we need to operate the business? Do we need to form data partnerships with suppliers or go-to-market partners to get a complete view of our customers?
Picking the more relevant dimensions means understanding the requirements that your organization most values. Once you’ve determined which dimensions you’re going to evaluate, you’ll need a measuring stick to determine what you’re going to call a gap. There are two schools of thought:
- Purists: If a data dimension (as described above) isn’t completely met, it’s a gap, regardless.
- Pragmatists: If you can still get the job done, it isn’t a gap.
Both views can be valuable ways of looking at your analysis. For example, in the fraud detection example from earlier, a purist might say that this use case calls for real-time account transaction history. If it is only available via a nightly batch update, then this is a gap in latency. However, a pragmatist may suggest a compromise that gets by with nightly updates to transaction history, since you have some real-time activity data as the customer interacts with the website. Thus, from the pragmatist’s view, there is no gap.
Leaning toward the pragmatist view is highly recommended so that you can focus on getting things done, especially as you’re first getting started. The point of the gap assessment isn’t to stand in your way; it is to help you achieve goals quickly.
Motivated companies know that being data-driven means focusing on business strategies to truly access the value in their infrastructure and data. This post has enabled you to recognize your data gaps, which is an excellent first step. Next, you must communicate these gaps to your stakeholders and put together a plan of attack. We’ll look at how to do that in a future post.
In the meantime, check out our Data Strategy Position Paper to learn more about strengthening your business.