Posts Tagged ‘Data science’

SVDS at Strata San Jose 2016

Several of our presenters were interviewed at Strata San Jose. If you missed the conference, check out these interviews below to catch up on some of the topics that were on our minds.

Why Notebooks Are Super-Charging Data Science

There is little limit to what can be done with a notebook. As well as the data science work you might expect, such as manipulating and graphing data, we’ve used them for sharing work on analytical tasks such as motion detection in video. In this post Edd takes a look at why we’re seeing notebooks everywhere.

Analyzing Caltrain Delays: What We Can Learn

In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API over the past few months. The goal is to get our heads into the data before setting off on building a prediction model.

Crossing the Development to Production Divide

We know what it’s like to deal with complex production deployments that cover the gamut from infrastructure upgrades, to feature deployments, to data migrations, where each step threatens to derail the plan. In this post she’ll give an overview of obstacles she’s faced (you may be able to relate) and talk about solutions to overcome these obstacles.

The Basics of Classifier Evaluation: Part 2

A previous blog post made the point that classifiers shouldn’t use classification accuracy as a performance metric. The next part in this series was going to discuss other evaluation techniques such as ROC curves, profit curves, and lift curves. However, there are several important points to be made first. Here I present a sequence that shows the progression and inter-relation of the issues.

Advanced Spark Meetup Recap

Our audience of engineers got right into the guts of Spark’s GraySort benchmark win last year with Chris Fregly from IBM Spark Technology Center. Here are a few highlights from the meetup.

How Do You Build a Data Product?

Data products are the reason data scientists are lately treated like rockstars. Along the way at SVDS, we’ve learned a few things about data products, which we shared as we told the story of the Caltrain Rider app.

Jupyter Notebook Best Practices for Data Science

We present here some best-practices that SVDS has implemented after working with the Jupyter Notebook in teams and with our clients.

Zero to Kaggle in 30 Minutes

We’ll walk through the steps for competing in Kaggle’s “Digit Recognizer” contest using SQL-based machine learning tools to identify hand-written digits.