Archive for the ‘Tools’ Category

Big Data is About Agility

Any technology is only as good as the way in which you use it.

Jupyter Notebook Best Practices for Data Science

We present some best practices that we implemented after working with the Notebook—and that might help your data science teams as well.

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

Brain Monitoring with Kafka, OpenTSDB, and Grafana

A team of our data scientists recently won 2nd place in Confluent’s Kafka Hackathon. In this post, explore their project—streaming EEG data and visualizing it.

materialized views code

Materialized Views with Cassandra

In this screencast, Principal Engineer and Cassandra committer Gary Dusbabek provides an overview of Materialized Views.

links

Noteworthy Links: Hadoop Edition

Hadoop is 10 years old! Check out these related links.

Jupyter Notebook for Data Science Teams

Data Scientist Jonathan Whitmore has just released a screencast tutorial for Jupyter Notebooks.

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.

Why Notebooks Are Super-Charging Data Science

There is little limit to what can be done with a notebook. As well as the data science work you might expect, such as manipulating and graphing data, we’ve used them for sharing work on analytical tasks such as motion detection in video. In this post Edd takes a look at why we’re seeing notebooks everywhere.