Enterprise Data World 2017

Name: Enterprise Data World 2017
Start: 2017-04-02T00:00:00-04:00
End: 2017-04-07T23:59:59-04:00
Location: Omni Atlanta Hotel

Enterprise Data World focuses on data-driven business. Several of us will be there this year, talking about data platforms and enterprise data science. If you can’t make it to Atlanta, you can sign up for our slides on this page.

Monday, April 3

Architecting a Big Data Platform

8:30am-11:45am

John AkredStephen O'SullivanMark Mims

What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop, Spark, and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

Acquisition: from internal and external data sources
Ingestion: offline and real-time processing
Storage
Analytics: batch and interactive
Providing data services: exposing data to applications

We’ll also give advice on:

Tool selection
The function of the major Hadoop components and other big data technologies such as Spark and Kafka
Integration with legacy systems

Managing Data Science in the Enterprise

1:30pm-4:45pm

John AkredHeather Nelson

Organizing around data is a concern for the whole business. The myth of the lone ranger data scientist is very much that; effectively leveraging data requires cross-functional collaboration, organizational adaptation, and an organizational understanding of what using data to create business value entails.

In this tutorial, we will share our methods and observations from three years of effectively deploying data science in enterprise organizations. Attendees will learn how to build, run, and get the most value from data science teams and how to work with and plan for the needs of the business.

Agenda:

Data science in the enterprise
Building a data-driven culture
Organizational concerns for data science
Data science techniques
Methods for running a data science project
Hiring and managing data scientists
Tools and platforms
Deploying data science: from the lab to the factory
Data science maturity models

Wednesday, April 5

Instant and Repeatable Data Platforms

11:45am-12:30pm

Heather NelsonMark Mims

Configuring a data platform and data science environment can be a tedious, error-prone process including development, continuous integration, QA, staging and production, and often has to be configured from scratch. By combining cloud platforms such as AWS or Azure with Terraform and Ansible, we can create a repeatable data science infrastructure.
In this talk, we’ll discuss our “push button” infrastructure tool and how attendees can use it in their own projects to create a cloud-agnostic environment that spins up quickly and is easy to configure as required.

We will cover:

Use cases, such as the ability to bring up the same cluster repeatedly, or disaster recovery
How to parameterize your cloud environment
Creating a data lab for the data scientist, with all the tools they require for their exploration
The development and release process, including integration testing
How to model costs in real-time to analyze price and desired performance