Mauricio Vacas

With years of experience working with cloud computing and distributed data architectures, Mauricio is passionate about creating value with technology. He is an industry-recognized leader in technical architecture for cloud-hosted data solutions.

Mauricio is a Senior Data Engineer at Silicon Valley Data Science. He has experience working with distributed storage and processing systems such as Hadoop, Spark, Cassandra and related tools in the ecosystem; application and web services development in Spring Java and Python; and designed and built cloud technical architectures in AWS and NTT. Mauricio has deployed models into production built using Spark, R, Impala, Hive, and other tools and works to bridge the gap between model development and deployment. Prior to joining SVDS, Mauricio was a technical architecture manager working in Accenture’s R&D group and Big Data practice. He managed a team of data scientists and engineers to build a web-scale recommender and network analytics streaming services on cloud infrastructure and presented the work at Strata and DataStax NYC* conferences. He was also a main developer in Accenture’s Cloud Platform which is used in over 30 client solutions and over 1600 managed servers. He has experience working with clients in the retail, healthcare, and banking industries.

Mauricio holds a Masters of Science in Computer Engineering from the University of Florida.

Recent Posts

Models: From the Lab to the Factory

Deploying a model without a rigorous process in place has consequences. We go over techniques for successful deployment and management.

How I Learned to Stop Worrying and Love Ephemeral Storage

This post will show architects and developers how to set up Hadoop to communicate with S3, use Hadoop commands directly against S3, use distcp to perform transfers between Hadoop and S3, and how distcp can be used to update on a regular basis based only on differences.

From Impala to Hive with Love

While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.

Past Events

2017

  • TDWI Accelerate Boston 2017

    Boston, MA

    We’ll be in Boston covering a variety of topics—from running agile data teams, to visual storytelling with data. Let us know if you’ll be there, or sign up to receive all our slides.

2016

  • Strata + Hadoop World New York 2016

    New York, NY

    The SVDS crew will be in New York this year, talking about data platforms, data strategy, and making the business case for Spark. Come by our talks, or catch us in the hallway track.

Bryan Walker

With a diverse background in technology strategy and solution architecture, Bryan is passionate about helping clients solve their biggest technology problems. He brings experience from several different industries to help clients develop strategies for how to best utilize their data and build a strong competitive advantage.

Bryan brings together a unique background of technology strategy and solution architecture, spaces where he has advised multiple clients in many different industries. Prior to joining SVDS, Bryan was the operations lead for Accenture’s Digital Customer initiative within Accenture Technology Labs. In this role, he developed strategies and advised clients on how to leverage big data architectures and real-time analytics technologies in order to develop more personalized and engaging experiences for retail customers. Prior to working on the Digital Customer initiative, Bryan helped to lead the build-out for some of Accenture’s initial assets for analyzing unstructured text data. He was also one of Accenture’s initial resources in cloud computing, where he designed the new technical architecture for a global energy client’s fuel retail division and authored a cloud and virtualization migration strategy for a large state university.

Throughout his experiences, Bryan has earned a patent in cloud computing technology and has built skills around: AWS, Python, Hadoop, Cassandra, and Tableau.

Brian holds a B.S. in Electrical Engineering from the University of Illinois.

Recent Posts

Rethinking Data Governance

Rethinking Data Governance

What is changing in data governance, how these changes can help you get more value out of your data, and what you can do to adapt to these changes.

Getting Value Faster with a Data Strategy

In this post, we’ll look at the components that make up a modern data strategy, and how they work to bring you business value quickly

The ROI of a Modern Data Strategy

In this post we look at the three components you can use to determine your data strategy’s ROI.

Getting Value Faster with a Data Strategy

Companies that demand immediate technical results often don’t want to take the time to develop a data strategy up front, but it will actually accelerate business value for your company. In this post, I’ll describe how these components fit together and work to bring you business value quickly.

Past Events

2017

  • Data Architecture Summit 2017

    Chicago, IL

    The Data Architecture Summit provides in-depth education from leading experts specializing in data architecture. We will be there discussing data platform and data governance. Let us know if you you’ll be attending and would like to chat.

Jonathan Whitmore

Following a postdoctoral position in astrophysics, Jonathan is a sought after speaker on computing and astronomy. He is excited by the application of machine learning and statistical techniques to industry problems and has developed novel data analysis techniques.

Jonathan has a diverse range of interests and is excited by the challenges and possibilities in the field of data science and engineering. He comes to SVDS after participating in the Insight Data Science Fellowship to prepare for transitioning from academic research into the tech industry. His Insight project was an art auction pricing prediction analysis. Before Insight, Dr. Whitmore completed an astrophysics postdoc at Swinburne University in Melbourne, Australia, where his research focused on trying to determine whether the physical constants of the universe have changed over cosmological times. This research sent him to world-class observatories to observe on the largest optical telescopes in existence. Further, he has a long-standing commitment to the public understanding of science and technology, most notably by his co-starring in the 3D IMAX film “Hidden Universe” which is currently playing in theatres around the world.

Jonathan received his PhD in physics from the University of California in San Diego, and graduated with a Bachelor of Science from Vanderbilt with a triple major in physics, philosophy, and mathematics.

Recent Posts

JupterCon notebook python

Themes from JupyterCon 2017

This past August was the first JupyterCon—an O’Reilly-sponsored conference around the Jupyter ecosystem, held in NYC. In this post we look at the major themes from the conference, and some top talks from each theme.

Exploratory data analysis in Python

Exploratory Data Analysis in Python

We summarize the objectives and contents of our PyCon tutorial, and then provide instructions for following along so you can begin developing your own EDA skills.

How to Navigate the Jupyter Ecosystem

In this post, we’ll be talking through a few tools that help make data science teams more productive.

Embracing Experimentation at AstroHackWeek 2016

Senior Data Scientist Jonathan Whitmore talks about experimentation and agility, based on his time at the unconference.

Jupyter Notebook Best Practices for Data Science

We present some best practices that we implemented after working with the Notebook—and that might help your data science teams as well.

Jupyter Notebook for Data Science Teams

Data Scientist Jonathan Whitmore has just released a screencast tutorial for Jupyter Notebooks.

Jupyter Notebook Best Practices for Data Science

We present here some best-practices that SVDS has implemented after working with the Jupyter Notebook in teams and with our clients.

Past Events

2017

  • PyCon 2017

    Portland, OR

    PyCon is the largest annual Python conference, and will be in Portland, OR this year. Our team will be there, talking about exploratory data analysis. Let us know if you’ll be there, or come say hi at our tutorial.

  • TDWI Accelerate Boston 2017

    Boston, MA

    We’ll be in Boston covering a variety of topics—from running agile data teams, to visual storytelling with data. Let us know if you’ll be there, or sign up to receive all our slides.

2016

  • Astro Hack Week 2016

    Berkeley, CA

    Data Scientist Jonathan Whitmore will be attending Astro Hack Week, please find him to say hi if you’ll be there.

  • PyData San Francisco 2016

    San Francisco, CA

    We’ll be at PyData, looking to learn more about how data scientists are using Python. Have a cool story, or questions of your own? Be sure to come find us.

2015

  • OSCON 2015

    Portland, OR

    SVDS presents two sessions at the Open Source Convention: A tool-agnostic tutorial for those who want to elevate the look and feel of their data visualizations; and a talk that will explore some overall best practices for sharing IPython Notebook code within a data science team.

  • SciPy 2015

    Austin, TX

    Because of its flexibility, working with the Jupyter Notebook on data science problems in a team setting can be challenging. We present here some best-practices that SVDS has implemented after working with the Notebook in teams and with our clients.

Richard Williamson

Richard has been at the cutting edge of big data since its inception, leading multiple efforts to build multi-petabyte Hadoop platforms, maximizing business value by combining data science with big data. He has extensive experience creating advanced analytic systems using data warehousing and data mining technologies.

Richard is an expert in big data architecture, platform deployment, and large-scale data science. Prior to joining SVDS, he led development of a multi-petabyte Hadoop platform at WalmartLabs. The platform included deployment of two separate Impala data warehouses that hosted production and ad-hoc workloads serving hundreds of billions of rows of log and transactional data. The warehouses also included HBase instances setup with active-active replication in two separate data centers serving millions of operations per second on near real-time data feeds from flume. Prior to WalmartLabs, Richard launched the first Hadoop system at Walmart Stores, spanning from idea to multi-petabyte production system starting in 2009. This included proposal to build Hadoop as a complement to the data warehouse and data mining platform then rapidly moving from proof of concept to full secure production deployment enabling customers to perform analysis that could not be done in existing systems.

He has also built several advanced analytics applications including: distributed optimization engine for workforce scheduling; forecasting systems using over ten years of history to predict daily, weekly and monthly sales; transportation route scheduling systems; supply chain optimization systems; price modeling systems; and various data mining efforts.

Richard holds a Bachelor of Science in Mathematics and Computer Science from Missouri Southern State University.

Recent Posts

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.

Flexible Data Architecture with Spark, Cassandra, and Impala

An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.

Past Events

2017

  • Best Practices for Spark in Production

    Principal Engineers Richard Williamson and Andrew Ray will be on Pepperdata’s webinar panel of industry experts, talking about Spark trends and use cases. Sign up here to attend the webinar, or get the recording.

2015

  • NWA Tech Summit

    Rogers, AR

    SVDS Principal Engineer Richard Williamson presents a session on “Leveraging Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine.”

  • StampedeCon

    St. Louis, MO

    SVDS presents two sessions at StampedeCon: one that examines the benefits of using multiple persistence strategies to build an end-to-end predictive engine; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.

  • Hadoop Summit

    San Jose, CA

    SVDS presents two sessions at Hadoop Summit: one that maps the central concepts in Spark to those in the SAS language, including datasets, queries, and machine learning; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.

  • Strata + Hadoop World 2015

    San Jose, CA

    Several of us will be presenting and we’d love to see you there. Join us for our tutorials and sessions, or come visit us at our booth in the Expo Hall.