Strata + Hadoop World

Name: Strata + Hadoop World
Start: 2016-03-28T00:00:00-08:00
End: 2016-03-31T23:59:59-08:00
Location: San Jose Convention Center

Many of us will be at the Strata Conference + Hadoop World 2016 in San Jose, and we’d love to see you there!

Visit us at booth 737 in the Expo Hall to meet us and check out some of the R&D we are doing. And please join us for our tutorial and presentation sessions:

Tuesday, March 29

Architecting a Data Platform

9:00am–12:30pm in Room LL21 C/D
John Akred, Stephen O’Sullivan & Gary Dusbabek

What are the essential components of a data platform? John Akred and Stephen O’Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.

By tracing the flow of data from source to output, John and Stephen explore the options and considerations for components, including:

Acquisition: from internal and external data sources
Ingestion: offline and real-time processing
Storage
Analytics: batch and interactive
Providing data services: exposing data to applications

The Business Case for Spark, Kafka, and Friends

9:30–10:00am in Room LL20 B
Edd Wilder-James

Spark is white-hot at the moment, but why does it matter? The secret power of big data technologies is that they promote flexible development patterns and economic scaling and are ready to adapt to business needs—but years of focusing on the label “big” has obscured much of the value to those approaching the topic. Skepticism and hype-fatigue are understandable reactions.

Developers are usually the first to understand why some technologies cause more excitement than others. Edd Wilder-James relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2016 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Edd explores the emerging platforms of choice and explains where they fit into a complete data architecture and what they have to offer in terms of new capabilities, efficiencies, and economies of use.

Developing a Modern Enterprise Data Strategy

1:30–5:00pm in Room LL21 C/D
Edd Wilder-James & Scott Kurth

Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technical solutions? Fundamentally, data should serve the strategic imperatives of a business—those key strategic aspirations that define the future vision for an organization. A data strategy should guide your organization in two key areas: what actions your business should take to get started with data and where to realize the most value.

Edd Wilder-James and Scott Kurth explain how to solve real business challenges with data.

Topics include:

Why have a data strategy?
Connecting data with business
Devising a data strategy
The data value chain
New technology potentials
Project development style
Organizing to execute your strategy

How to Eat Change for Breakfast: Building an Experimental Enterprise

4:10—4:55pm in Room 211 C
Sanjay Mathur
NOTE: this talk is part of the concurrent Cultivate conference.

The world of data—from practitioner skill sets to consumer assumptions and even employee expectations—is changing. Our current and potential consulting clients recognize this: the recruits they’re talking to are eager to employ data in their decision-making processes. To truly take advantage of this fact—to thrive—your business must adapt.

An experimental enterprise is, fundamentally, an organization that thrives on change and uses data as a catalyst. Becoming an experimental enterprise means reshaping the way you and your company understand things like failure, the role of technology, and your own gut instinct. But the benefits are fantastic learning and growth. Sanjay Mathur offers three key questions to ask yourself and three pitfalls to avoid along the way.

Thursday, March 31

Format Wars: From VHS and Beta to Avro and Parquet

1:50-2:30pm in Room 230 A
Silvia Oliveros & Stephen O’Sullivan

Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance.

Silvia Oliveros and Stephen O’Sullivan cover the four major data formats (plain text, SequenceFile, Avro, and Parquet) and provide insight into what they are and how to best use and store them in HDFS. Each of the data formats has different strengths and weaknesses, depending on how you want to store and retrieve your data. For instance, Silvia and Stephen have observed performance differences on the order of 25x between Parquet and plain text files for certain workloads. However, it isn’t the case that one is always better than the others.

Drawing from a few real-world use cases, Silvia and Stephen cover the hows, whys, and whens of choosing one format over another and take a closer look at some of the tradeoffs each offers.

Ask Us Anything: Developing a modern enterprise data strategy

4:20–5:00pm in Room 211 A–C
John Akred, Scott Kurth & Colette Glaeser

The team behind the tutorial “Developing a modern enterprise data strategy,” field a wide range of detailed questions. Even if you don’t have a specific question, join in to hear what others are asking.

Event Speakers

John Akred

With over 15 years in advanced analytical applications and architecture, John is dedicated to helping organizations become more data-driven. He combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.
Edd Wilder-James

Founder of the pioneering data conference, O’Reilly Strata, Edd is a respected voice in the worlds of data, open source and the web. Bringing together deep technical know-how with market understanding, Edd makes sense of information technology and its trajectory.
Stephen O’Sullivan

A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.
Scott Kurth

Building on 20 years of experience making emerging technologies relevant to enterprises, Scott crafts vision and strategy for organizations. With a background in architecture and engineering, he combines deep technical knowledge with a broad perspective, to focus on business value.
Silvia Oliveros

With a background in computer engineering and visual analytics, Silvia has worked on several projects helping clients explore and analyze their data.
Gary Dusbabek

An Apache Cassandra committer and PMC member, Gary specializes in building distributed systems. Recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed […]

Strata + Hadoop World

Tuesday, March 29

Thursday, March 31

Customer Knowledge

Customer Knowledge

Home

Sign In