Minding Your Data Gaps

Knowing which gaps to plug  |  April 13th, 2017

In an earlier post, we explained that to understand data gaps, you must start with your strategic business objectives (what you want to do with the data), understand the data being used, analyze the dimensions of data that are reflective of your needs, and look at how your current data fulfills these needs. The next step is being able to represent this visually, including some of the multi-dimensional information necessary to portray your business data needs. This is a powerful way to engage senior leadership and get the resources to enable “minding the gap.”

The hardest part usually isn’t plugging the gap; it is knowing which gaps to plug when you can’t possibly do it all at once. Let’s look at an illustrative example: fraud detection.

Preventing fraud detection by plugging gaps

According to the 2012 “Faces of Fraud” Survey conducted by Information Security Media Group, in 82% of cases involving identity fraud, the consumer uncovered the theft before the company. Not surprisingly, 26% of organizations surveyed reported losing consumers to competitors following a fraud incident. Our strategic business objective here, therefore, is to prevent fraud.

As mentioned in Is Your Data Holding You Back?, preventing account takeovers is a common way of addressing fraud risks. In the case of account takeover, a fraudster obtains the online account credentials of a legitimate customer with the intent of misusing the account (e.g., wiring funds out of the account into their own). The victim feels understandably violated and may blame the account holder. We can identify preventing account takeovers as a functional requirement, then, of our objective to prevent fraud. There will likely be other functional requirements tied to this business objective as well.

Let’s dig a little deeper and consider how account takeover can be prevented. There are several ways our theoretical organization could approach this, and those ways become our technical use cases. One such use case involves device recognition. This is often employed at login, to compare a laptop, desktop, or mobile phone’s signature to that of the legitimate owner, and/or to a database of devices previously associated with fraud. So, device recognition requires both customer data and device data.

Can the organization employ device recognition, or do they have data gaps holding them back? Since this is an assessment of data adequacy from the perspective of application architects (as opposed to, say, a view you’re providing to the CDO), we’ll focus on application-relevant dimensions like breadth, depth, latency, and frequency. The customer and device data needs to be sufficiently broad to enable the necessary attributes required for associating the correct information with the login event, have enough depth or history to have a decent number of known fraudulent devices to compare to, have a relatively high refresh frequency to ensure the latest information is being used, and be low latency to be able to trigger the appropriate action if fraud is suspected.

Therefore, when considering what is needed to build the capability of device recognition, we would need to assess whether there are currently gaps in these specific dimensions—breadth, depth, frequency, and latency—for both customer and device data. We can extend this exercise by performing the same assessment for the other functional requirements and related use cases for preventing account takeover. This begins to help shape a larger understanding of what efforts are needed to start reducing fraud.

Communication data gaps

Imagine that you’ve gone through this process for all of your important business objectives and you’ve identified several areas with data gaps. Now what? Closing data gaps is a complex, multi-dimensional problem. Finding a way to manage that complexity is key. It isn’t easy to alter people’s mindset from “we must clean the data until it is pristine” to “we need data good enough to solve this important problem.”

Communicating what you’ve found visually is a great way to advise and persuade your stakeholders. Here is an example visualization we often use that shows whether gaps exist along the key identified dimensions (such as breadth, depth, frequency, and latency identified for our fraud example) that would prevent a this business objective from being met.

Visualization of data gaps

You would typically have a more granular view of the data categories, but this example is illustrative of the larger information categories of customer, device, etc., and how these support the capabilities, shown in prioritized order. Questions to consider when deciding how to categorize your data for assessment are:

  • What is creating the data (e.g. a person, a sensor, software)?
  • Who is collecting the data (e.g. a division within the organization, a third party)?
  • What type of data is it (e.g. unstructured, structured, geospatial, image)?
  • What is the data describing (e.g. your customer, your vendors, manufacturing processes, lab experiments)?

You also need to customize the different dimensions shown in the four-box key (latency, frequency, breadth, depth) for example, to focus on the ones that are meaningful for what your business is trying to accomplish. The assessment of four dimensions not only allows for the lovely figure above, but we have also found that the exercise of reducing the dimensions in focus to four helps an organization to identify the key blockers that are holding them back from meeting their objectives.

This view can be read in many different ways to spot trends – box-by-box, row-by-row, column-by-column.

  • Box by box: This illustrates the gaps in one data requirement for achieving one capability. For example, the box in the first row and column shows gaps in breadth and depth of customer data for the capability “Login Score Tuning.” These gaps exist because not enough information has been collected from the customer over enough time to measure loyalty.
  • Column by column: This illustrates the impact gaps in data requirements have on one business objective. For example, no category of information meets the “latency” requirement for meeting the business objective “Mobile Specific Monitoring.” Monitoring is typically a real-time activity that requires near immediate data collection. If mobile-specific monitoring was a high-priority capability, this column would indicate that one of the highest-priority next steps would be to build the infrastructure necessary to process data in real time.
  • Row by row: This illustrates the opportunity closing the gaps in data requirements for one data source has across all the business objectives. Device data is collected on an inconsistent basis; each device sends information at a different frequency. Collecting device information on a consistent and more frequent manner will close the gaps in frequency across almost all business objectives. A row-by-row view of this figure can help to prioritize next steps that would have the most impact across the business objectives.


Protecting your customers from fraud shows your customers that you value them, builds trust, and helps protect the bottom line. Evaluating data gaps and making informed choices on where to invest in plugging data gaps is a smart way to proceed. You can’t possibly do it all at once, so focus on where you can make the highest impact.