How Do You Build a Data Product?

December 15th, 2016

Editor’s note: Welcome to Throwback Thursdays! Every third Thursday of the month, we feature a classic post from the earlier days of our company, gently updated as appropriate. We still find them helpful, and we think you will, too! The original version of this post can be found here.

Taking a narrow view of the term “data product,” it might be tempting to think that data products are just about selling data. In fact, the definition is wider and more profound: data products are those whose core functions leverage data, be they physical products, software, or services.

Data products are the pinnacle of data-driven business, bringing insight and intelligence into the customer experience, getting better with every use, and enabling new ecosystem-based business models.

What makes a data product?

Data products incorporate data science into the operation of a product or service, using data in smart ways to provide value. It’s more than just analysis: it’s putting insight into production. Every day we use the archetypal data product, Google search, and our every interaction with the service makes it better. Another famous early data product is LinkedIn’s “people you may know” feature, helping you locate people in your social networks.

At SVDS, we’ve learned a few things first-hand about data products as we have been building a data product for some years as part of our R&D program, the Caltrain Rider app. Caltrain Rider is a mobile app that provides predicted travel times for the Bay Area’s Caltrain service, providing better results than any other published source.

Let’s look at some of the hallmarks of data products.

Data products can input data from their own usage to improve

By observing how users interact with your product, you can learn a lot. Through instrumenting the user interface, analyzing logs, or other ways of deriving data from users, you can gain extra signals that help improve your data modeling. In the case of Caltrain Rider, we are using GPS data from users’ phones to understand better the movement of trains within the system.

Data products are bootstrapped and then evolve

Good data products are rarely “done”—through usage and continued investigation, you start to understand better the problem that you’re trying to solve. One of the characteristics of working with data is that it’s best to work in an agile way: often you don’t even know the right question to ask until you’ve explored the problem space. Get a product in use early, then learn, adapt, and evolve the product.

Data products are best built with nimble, multifunctional teams

The rapid cycle of product evolution is best served by a multifunctional team of data scientists, engineers, product managers, and architects. To move fast with data, data scientists need to get the data from engineers, and insights and discoveries from the data science informs product direction. If these people are in disconnected departments, product development moves slowly and can be defeated by poor communication.

Multi-source data, because “GIGO” still applies

Every student learns that “garbage-in, garbage-out” is true of computer systems, and data products aren’t any different. If you don’t have good data going in, you won’t get a good result. However, that doesn’t mean you throw weak data away. Instead, by using as many diverse data sources as possible, you can create models that are robust in the face of any of the individual sources failing or being erroneous. With Caltrain Rider, we’re bringing in audio, video, GPS, and social signals in addition to schedule and API data.

Data products can learn things from a system that’s otherwise closed

One of the most exciting aspects of data science is that we can use observed data signals to predict the behavior of a system that we can’t directly access or comprehend. For example, without understanding the semantic import of every web page, search engines can still figure out which is most useful. With Caltrain Rider, we’re working to predict the behavior of the train system, without any special access to the system itself. This opens a world of opportunity for innovation and entrepreneurialism. If you can use data, you can crack pretty much any problem area you want. That’s why data-driven companies are challenging the grocery, taxi, and entertainment industries, to name just a few.

Data products solve a real problem that people have

Technology is important, for sure. It can often make new things possible, and transform whole industries. But for successful products and companies, it’s always the problem that comes first. A great data product focuses relentlessly on solving the problem that the user has, using whatever data and techniques will help. With today’s proliferating options of platforms, tools, and languages, the only practical way to navigate these options is with a laser focus on how they can help solve the human and business problems at hand.

Diving deeper

Interested in learning more about our Caltrain project? Here are some other posts on the topic: