Author Archive

image processing feature

Image Processing in Python

In this post, we use a Jupyter Notebook go over the steps for creating a proof of concept for the image processing piece of our Caltrain work.

Introduction to Trainspotting

In this post we’ll start looking at the nuts and bolts of making our Caltrain work possible: image processing, video analysis, and image recognition.

predix transform iot image

Predix Transform 2016

We detail insights learned while attending the recent Predix Transform conference.

Learning from Imbalanced Classes

This post gives insight and concrete advice on how to tackle imbalanced data.

SVDS Strengthens Executive and Advisory Team

I’m excited to announce two new members of our team: Antony Falco (VP, Product & Innovation) and Nayla Rizk (Advisor).

Scaling Data Science: Dream Big, Start Medium-ish

On July 13th we welcomed the Open Data Science Conference meetup series to our HQ—our speaker talked about thinking critically about the size of your data.

How I Learned to Stop Worrying and Love Ephemeral Storage

This post will show architects and developers how to set up Hadoop to communicate with S3, use Hadoop commands directly against S3, use distcp to perform transfers between Hadoop and S3, and how distcp can be used to update on a regular basis based only on differences.

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

Why You Need a Data Strategy

While it would be great for everyone if you could just “buy a Hadoop” and skip straight to “Profit!”, in reality there’s a lot of work involved, and 95% of it is unique to your business. How do you determine the steps of a big data project, and ensure it delivers results early? This post talks about where to start.