One of the benefits of being co-chair for this June’s Spark Summit (June 5-7, San Francisco) is the insight into how this technology is being used in leading companies. In particular, I am excited to see evidence of an important pattern: the creation of internal service platform to meet the data science and analytic needs of organizations.
These data science platforms give access to data and computation power to communities of business analysts and data scientists within an organization, while conferring the benefits of managed access and scalability to the organization.
I expect this model to become the norm for analytics organizations in enterprises over the next five years. There are two factors that are driving this change.
Firstly, there is the rapid increase in demand for data. In a digital world, data is the way we understand ourselves, our customers, and our competition. Everyone needs data to do their jobs. Sounds like a great thing, but it comes with headaches: if it’s hard to get at data, people will hoard it; point-to-point sharing of data means people will duplicate it, stash it where they can. It’s a data governance nightmare, and makes it really hard for people to build on each other’s work. Strict rules or poor service levels just drive bad behavior underground. The answer is to provide an organized data platform that gives better service.
Secondly, we have the technology now to make organization-wide data platforms economic. As we move to commodity, scale-out, analytics platforms, we have to worry less about guarding resources and policing usage. We can develop a more sophisticated infrastructure that looks a lot more like a data community. Analysts can help each other, sharing data and models.
The advantages of a successful analytic platform are clear: better service levels for data users in the organization, and the prospect of making data governance a feasible endeavor.
Data management professionals face a transition in their roles: from data custodians, to data evangelists; from functioning as a utility, to providing a user-facing product—data as a service. It’s an exciting time, and I’m glad there are some great examples to learn from. These sessions at Spark Summit tell this story further:
- PayPal’s Spark Compute as a Service
- Comcast’s Data Science as a Service platform
- StitchFix’s self-service data science
To see these talks and more, consider joining me at Spark Summit. Register with the code
EDD2017 and get a 15% discount.