Rethinking Data Governance

New Rules for a New Data Ecosystem  |  July 11th, 2017

As data and analytics technology change, they bring new data governance challenges with them. Of course, there are various definitions of data governance, depending on whom you ask. For the purposes of this post, we’re defining it as the practices that ensure the proper usage, integrity, and security of data throughout all points in the data lifecycle. In other words, it’s the plan, or “steering,” for how to manage your company’s data. And it’s an ever-evolving ecosystem.

Traditional governance measures that covered Enterprise Data Warehouse (EDW) systems still apply, but roles, best practices, and capabilities are changing. Governance practices must take modern data technologies and use cases into account. For example, we see many companies now store raw data sources in a data lake and integrate them “on the fly” for unanticipated use cases. In other cases, roles are changing along with the technology. Data stewards are now advocates—not gatekeepers—as data sharing has become encouraged across different lines of businesses.

As the expectations and use cases for data grow and evolve, the ways you think about and the requirements for governing your data will change. You will need to develop an updated governance strategy that addresses the changes brought about by distributed data platforms, changing governance roles, the explosion of data use cases, and the evolving expectations of the business.

In this post, we will describe in detail what is changing in the data governance space, how these changes can help you get more value out of your data, and what you and your organization can do to adapt to these changes.

The old ways are changing

In the traditional data platform, data came in from a small amount of curated sources, was loaded into an RDB or EDW system, and was analyzed through a BI reporting tool (Figure 1).

Example Traditional Data Pipeline

Figure 1: Example Traditional Data Pipeline

In such environments, data stewards were able to curate the data ahead of time, as the number of data analysis use cases was known and finite. Because of the effort involved in curating the data, adding fields to or modifying the schema was a lengthy process. New data sources were only added after careful vetting and extensive reviews.

Today, however, companies need access to many new types of data for exploration and analysis. Data is no longer stored solely in an EDW for reporting, as companies have put these new, unstructured data sources in a distributed data platform. Some data files are even “schema on read,” where the schema is not defined until the use case is determined and data access takes place.

To add to some of the challenges, data quality and data lineage tracking is becoming even more important with the added complexities of these on-the-fly integration cases, especially when regulation and compliance is involved. Today’s data pipelines are much more complex (Figure 2). Data comes from more sources, is stored in many places, is continually refined, and can be used for many new types of services and applications.

Example Modern Data Pipeline

Figure 2: Example Modern Data Pipeline

A common knee-jerk reaction is for a company to take its existing data governance practices, the ones that were designed when the only data store was the EDW, and apply them unchanged to these new technologies and needs. This may feel comfortable, but ultimately the policies and use cases that were designed to work with less flexible systems end up stifling innovation rather than enabling it. This is where the impetus to modernize data governance comes from.

What needs to change?

To examine what needs to change in data governance, let’s start with three high-level areas:

  1. A data governance operating model
  2. Views, capabilities, and processes on data stewardship (we define stewardship as being in charge of the management and usability of data)
  3. Views, capabilities, and processes for information lifecycle management

Note: it’s not these three components themselves that are changing, but how the related activities, practices, or technologies are applied to the data.

Data governance operating model changes

Previously, a single person or group within IT often ran governance for a line of business. Now, with much more data sharing, data governance professionals must collaborate across lines of business. As such, data governance is much more collaborative, with both centralized and hybrid operational models in practice and, likely, a Chief Data Officer (CDO) leading the charge.

Data stewardship changes

The roles and objectives of data stewardship now focus on identifying and enabling the ways you can get value out of your data. Data stewards are now acting as data advocates, encouraging the sharing of data across different lines of business that previously worked only in silos. Instead of being responsible for meticulously cleaning and curating the data, like a data janitor, stewards now shepherd the data, making sure that it is usable and can be integrated for unanticipated use cases. And instead of doing by-the-book data monitoring, stewards now act as data caretakers, taking a more holistic view on potential use cases when thinking about data quality.

Data lifecycle management changes

Now that data architectures include both EDW and Distributed Data Platform (DDP) systems, governance processes on information lifecycle management have changed accordingly. The policies for the EDW data still exist, but new ones have popped up for data going into a DDP. As we mentioned earlier, the technology is more flexible, so it no longer takes weeks or months to ingest a new data source. Since it does not have to adhere to a strict schema, new data can be added in a matter of days, or even more quickly. Once new data is ingested, it’s stored in its original format, not a pre-defined schema. Without these pre-defined schemas, metadata capture and management have also become more important, enabling data classification and exploration.

Figure 3 gives a summary of the changes across these three dimensions.

Changes in Data Governance

Figure 3: Changes in Data Governance

While they may initially appear overwhelming, these changes can actually bring your company a significant competitive edge. Proper data governance can help to reduce costs, enable better decisions, and enhance collaboration through data sharing.

What are your next steps?

To truly take advantage of the capabilities modern data governance provides, you must do an honest assessment of where you’re starting from. Follow these steps:

  • Ensure that you have a plan and goals for data governance at your organization, including an identification of your governance priorities. All of this should line up with your larger business goals.
  • Take a serious look at your data governance roles, policies, and practices. Are any of them outdated?
  • Stay focused on the outcomes and behaviors you want to create. Make sure you are balancing risk appropriately with the capabilities to get additional value out of your data.

Use these steps to build a solid foundation, making it easier to get more value out of your data.