Tools Archives - Page 5 of 5 - Silicon Valley Data Science

5 Reasons Why Spark Matters to Business

It’s been hard to miss Apache Spark in the last year. Many systems integrators, including ourselves, have also been enthusiastic about it.

EDD WILDER-JAMES

March 17, 2015

Two Tips for Optimizing Hive

Hadoop is only beneficial if using it is efficient. Hadoop’s Apache Hive is frequently used to handle ad-hoc queries and regular ETL workloads.

March 5, 2015

Using Docker to Build a Data Acquisition Pipeline

In this post, we walk through an example of using Docker to develop a data acquisition pipeline to ingest mobile app GPS data using Kafka and HBase.

KEVIN ZIELNICKI

March 3, 2015

Ignition Spark: Mike Franklin joins SVDS as Advisor

It gives us great pleasure to announce that a key member of the Spark team, Professor Michael Franklin, has joined our advisory board.

SANJAY MATHUR

October 30, 2014

Flexible Data Architecture with Spark, Cassandra, and Impala

An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.

RICHARD WILLIAMSON

September 30, 2014

Storing and Visualizing Time Series with Graphite

Graphite is a tool that does two things rather well: storing numeric time-series data (metric, value, epoch timestamp), and rendering graphs of this data on demand.

STEPHEN O’SULLIVAN

September 12, 2013