Archive for the ‘Tools’ Category

5 Reasons Why Spark Matters to Business

It’s been hard to miss Apache Spark in the last year. Many systems integrators, including ourselves, have also been enthusiastic about it.

Two Tips for Optimizing Hive

Hadoop is only beneficial if using it is efficient. Hadoop’s Apache Hive is frequently used to handle ad-hoc queries and regular ETL workloads.

Using Docker to Build a Data Acquisition Pipeline

In this post, we walk through an example of using Docker to develop a data acquisition pipeline to ingest mobile app GPS data using Kafka and HBase.

Ignition Spark: Mike Franklin joins SVDS as Advisor

It gives us great pleasure to announce that a key member of the Spark team, Professor Michael Franklin, has joined our advisory board.

Flexible Data Architecture with Spark, Cassandra, and Impala

An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.

Storing and Visualizing Time Series with Graphite

Graphite is a tool that does two things rather well: storing numeric time-series data (metric, value, epoch timestamp), and rendering graphs of this data on demand.