Archive for the ‘Spark’ Category

Data Ingestion with Spark and Kafka

In this tutorial, we will walk you through some of the basics of using Kafka and Spark to ingest data.

kafka spark pipelines monitoring alerting

Managing Spark and Kafka Pipelines

In this post, we will cover some of the basics of monitoring and alerting as it relates to data pipelines in general, and Kafka and Spark in particular.

From Data Managers to Platform Providers

We are seeing evidence of an important pattern: the creation of internal service platform to meet the data science and analytic needs of organizations.


Making Spark and Kafka Data Pipelines Manageable with Tuning

In this post, we’ll walk you through how to use tuning to make your Spark/Kafka pipelines more manageable.

Spark Summit: Ignition in the Enterprise

We are excited to announce for Spark Summit 2017 in San Francisco, Edd Wilder-James will be joining Reynold Xin as co-chair of the Spark Summit program.

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the streaming API to illustrate how to predict demand in order to adjust resource allocation.

Reshaping Data with Pivot in Spark

Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.

Pivoting Data in SparkSQL

Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.