Tools Archives - Page 4 of 5 - Silicon Valley Data Science

Reshaping Data with Pivot in Spark

Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.

ANDREW RAY

February 16, 2016

Data Day and Graph Day Texas Slides

Check out the slides from our recent presentations at Data Day TX and Graph Day.

February 2, 2016

Space Shuttle Problems: Long-term Planning Amid Changing Technology

How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.

JOHN AKRED

January 28, 2016

Pivoting Data in SparkSQL

Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.

ANDREW RAY

January 5, 2016

Advanced Spark Meetup Recap

Our audience of engineers got right into the guts of Spark’s GraySort benchmark win last year with Chris Fregly from IBM Spark Technology Center. Here are a few highlights from the meetup.

CHRISTIAN PEREZ

December 3, 2015

From Impala to Hive with Love

While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.

MAURICIO VACAS

October 20, 2015

Develop Spark Apps on YARN Using Docker

Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.

MARK MIMS

October 15, 2015

Jupyter Notebook Best Practices for Data Science

We present here some best-practices that SVDS has implemented after working with the Jupyter Notebook in teams and with our clients.

JONATHAN WHITMORE

September 10, 2015

Use Cases for Apache Spark

The Apache Spark big data processing platform has been making waves in the data world, and for good reason.

June 15, 2015

Archive for the ‘Tools’ Category

Sign In