Editor’s note: This was originally posted on the O’Reilly Data blog as “When building an enterprise data strategy, consider “why?” before “how?”
At the 2015 Strata + Hadoop World in New York, members from consulting firm Silicon Valley Data Science (SVDS) responded to audience questions and discussed best practices in creating and implementing data strategies in a session called, “Ask Me Anything: Developing a Modern Enterprise Data Strategy.” Panelists included SVDS’ VP of advisory services Scott Kurth, chief technology officer John Akred, and director of communications Julie Steele.
Central to the discussion was the notion that an enterprise data strategy must be driven by business goals, and not technology. Kurth, Akred, and Steele also discussed some corollary topics with attendees, including when to start selecting tools for a data science project and how to drive team consensus about data science roadmaps.
Establish a data strategy by first considering business goals, not technology solutions
In the many years Kurth, Akred, and Steele have worked in the data space, they have witnessed several different approaches to creating enterprise data strategies. Some organizations start with a specific, trending technology in mind, such as Hadoop or Spark. Some ponder whether they should start with a specific tool—such as “bare Kafka”—and then migrate or upgrade to something else. Others come out the gate fairly certain they want to do specific types of projects, such as image recognition. “The range of pet projects for technologies in the client space surprises me all the time,” said Akred.
To this, Kurth offered a simple line of advice: “Don’t start by talking about technology.” Instead, start with what you’re trying to achieve as an organization. “What’s the needle that you’re trying to move in your business?” Once those questions are answered, start talking about how the technology can get you there.
Business needs often change, which causes business goals to change, and in turn causes technology solutions to change. With this in mind, Kurth advised that data strategies be crafted with efficiency, within a roughly six- to eight-week time frame, and reflect today’s business needs as well as projected needs 18 to 36 months out.
Rank projects based on business goals
Once you have clear understanding of your business goals, said Kurth, you can then sort your queue of projects in a way that corresponds to those goals. While you may not be tackling projects at the pace you desire, you’re solving the most important problems first.
It’s helpful to think about projects as incremental steps toward building that larger platform, said Kurth. As new projects are added to your platform, you will have more capabilities in place, so the pace at which you’re building should be accelerating. Focus on building your architecture a bit at a time while solving business problems and still building other pieces—and keeping constituents happy.
Akred followed up by invoking the immutable laws of physics: “If you have a fixed set of resources, you can only do so much so fast.”
Make tool choices when you’re ready to use the tools
If you adopt, install, and configure a tool too early, you run the risk of having it sit idle while you determine a meaningful use case. Considering the constant evolution of competing technologies and the possibility that your organization’s mission could shift at any time, the time the tool sits idle could lead to it becoming obsolete before it’s even used.
“Look at your body of projects, the work that you think you have in your future—not just fixing old problems, but aspiring to do new things—and say, ‘what’s the platform we need to accomplish this?’” said Kurth. Forget about any legacy software and resist the urge to simply build on top of it. “Don’t shy away from anything and just use the wrong technology because it’s what you already have,” he noted. “Take it as an opportunity to get the technology that you really want and need.”
Another consideration when adopting tools is their longevity. In order for your investment to stay relevant amid shifting business needs, it’s important to select a tool that satisfies current and anticipated business needs. “[Start] with pure open source as a way to experiment cheaply before you invest, and prove that you can actually do something well before you go down the path of buying either a commercial software or supported open source,” said Kurth.
Akred also encouraged the audience to consider the communities using those tools—both inside an organization and outside. Can your team’s skills support the tools you have or want? Also, is there momentum and energy around a particular product? Does it have a diverse and active community of users? The Kafka community, for example, is extremely energetic, so being a part of it has benefits.
Put architectural issues in perspective with business goals
As with projects and tools, architectural issues should be prioritized by their impact on business goals. One attendee asked the panel about rebuilding a “deficient data warehouse,” and where this task might fall in the list of priorities.
“One of the most common deficiencies that we see is people trying to use the data warehouse for something for which it was never intended,” said Kurth. Understanding the nature of your architectural deficiencies and what your architecture is trying to achieve for your business, he said, are important issues to tackle early on when establishing a strong data strategy.
If the data warehouse has become a roadblock to achieving your business goals, said Kurth, then start making a list of the business problems you’re trying to solve but don’t yet have solutions for. Rank problems in order of importance. “Rebuilding the data warehouse” may be part of that list, but it may actually be low in priority. In that case, other problems should probably take precedence. “A lot of issues that seem like technology issues are actually political issues,” said Steele.
Drive consensus around your data strategy through collaboration and leadership
To drive consensus around your data strategy and technology roadmap, Akred recommended scheduling collaborative sessions with the business, technology, and product teams. “We lock the door and we don’t let anybody out until we have gone through and mapped that business priority to the technology roadmap and thought about things like dependencies,” he said.
Steele added that many companies are hiring chief data officers (CDOs) to bridge the business and technology sides of an organization with an aim to achieve business goals. Another important function of CDOs, she said, is to facilitate dialog and determine how the data will flow between and among organizational silos. “To have a part of the company that’s focused on just coordinating among silos, centralizing priorities, all that kind of stuff, it’s definitely a rising trend for a reason,” said Steele.
To effectively perform these tasks, a CDO needs to have a clear view of the business landscape as well as a deep knowledge of data science tools and platforms. Ideally, this person also possesses incredible political skills. “It needs to be somebody who can say ‘no’ a lot to people,” Steele added.
Coordinating among silos can have a tremendous impact on an organization’s data strategy. As Akred discussed, when the business units that own the data have no incentive to support global optimum—when their incentive is to drive the local optimum—projects will likely fail. One strategy to address this, he said, is to allow business units to make value cases for technology projects, and get those projects subsidized so the data can be made broadly available to the organization at no cost. “State Street, the financial services company, has a pretty good case study there, if you’d be interested in seeing some people who have done some interesting things to drive better global optima,” said Akred.
Wash, rinse, repeat: Continue to re-evaluate your strategy and practices
You’ve defined and prioritized your business goals, you’ve force-ranked the projects and problems, and you’ve selected the best tools. You’re finished, right? Wrong. “As you make progress toward your horizons, you’re re-evaluating based on the experiences you’re getting in the projects you’re doing now, and using that as a feedback loop to that roadmap,” said Akred.
Akred recommended preserving data quality along the way by developing a data catalog, documenting what you’ve done to data sets in a Python-style notebook, or keeping careful track of your database logs so you can refer to them later. “One of the really exciting areas of technological innovation in the data space is around automated metadata creation and interoperability of that metadata,” he said. Several interesting vendors are working in this space, creating tools that make it easier to derive a data catalog. Akred mentioned a Trifacta tool that captures all the transformations of your data, creating a process documentation. Elation Data, he added, looks at user behavior to glean information about data. “They trail all your system logs and they find out that Heather writes 60% of the queries that hit this field,” said Akred. “Well, even if I don’t have an officially designated data steward, I’ve now gained some really useful information, and it might be logical for me to reach out to Heather should I have some questions about what that field means or how I should interpret it.”
Finally, a critical part of the re-evaluation stage, said Akred, is having regular conversations with all team members about the projects, progress, and priorities to ensure everyone remains on board and on the same page.
Take-aways and key ideas
Communication and agility drive a successful enterprise data strategy. Technology is simply a tool to assist with the implementation. Clarify your business goals, evaluate (and re-evaluate) project placement in the pipeline, and keep the focus on top business priorities. Let that discussion and understanding lead you to the technology that fits your needs.