Andrew Ray

With a background in mathematics and graph theory, Andrew is always looking for simple and elegant solutions to complex problems. He enjoys working at the intersection of engineering and data science.

Andrew is passionate about big data and has extensive experience working with Hadoop, Hive, MapReduce, and Spark. He is an active contributor to the Apache Spark project, including SparkSQL and GraphX. Prior to joining SVDS, he was a Data Scientist at Walmart, where he was a principal engineer of a new big data customer analytics platform that integrated data from Walmart and Sam’s Club’s different retail channels—from Stores to eCommerce transactions. Using Java MapReduce, a custom decision tree framework, and a custom graph algorithm to do fuzzy matching on over a billion customer records, his match and compare analysis was able to help Walmart identify and match customers across the different sales channels.

Andrew’s work resulted in six granted patents for Walmart and won him the ‘Innovator of the Year’ award. He also led the adoption of Apache Spark at Walmart from proof-of-concept to production, and has led large training sessions.

Andrew earned his Ph.D. in Mathematics from the University of Nebraska where he worked on extremal graph theory. He also holds a Master of Science in Mathematics and Bachelor of Science in Computer Science and Mathematics from the University of Central Missouri.

Recent Posts

Structured Streaming in Spark

This post gives you a quick overview of the new structured streaming feature in Spark 2.0, illustrating why it’s an exciting addition.

Reshaping Data with Pivot in Spark

Andrew gives you a deep dive into pivoting data with SparkSQL. This piece was originally posted on the Databricks blog.

Data Day and Graph Day Texas Slides

Check out the slides from our recent presentations at Data Day TX and Graph Day.

Pivoting Data in SparkSQL

Andrew Ray, Senior Data Engineer, contributed to the most recent release of Spark. This post gives examples of how to use his pivot commit in PySpark.

Past Events


  • Spark Summit 2017

    San Francisco

    Join us at our Spark Summit sessions in San Francisco, where we’ll be giving a tutorial on data platforms, as well as sessions on PySpark and Graph Algorithms. Find CTO John Akred, VP of Engineering Stephen O’Sullivan, or Principal Data Engineer and Spark Contributor Andrew Ray to talk more.

  • Best Practices for Spark in Production

    Principal Engineers Richard Williamson and Andrew Ray will be on Pepperdata’s webinar panel of industry experts, talking about Spark trends and use cases. Sign up here to attend the webinar, or get the recording.


  • StampedeCon 2016

    St. Louis, MO

    Senior Data Engineer and Spark contributor Andrew Ray will be at the conference, giving a talk on structured streaming and datasets in Spark 2.0.

  • Spark Summit East

    New York, NY

    We’re excited to have Senior Data Engineer Andrew Ray speaking about pivoting data with SparkSQL. Andrew contributed the Pivot Commit to Spark 1.6.0.

  • Graph Day

    Austin, TX

    Join us as Senior Data Engineer Andrew Ray presents a talk called “Intro to Pregel and PowerGraph.”