
Models: From the Lab to the Factory
Deploying a model without a rigorous process in place has consequences. We go over techniques for successful deployment and management.
With years of experience working with cloud computing and distributed data architectures, Mauricio is passionate about creating value with technology. He is an industry-recognized leader in technical architecture for cloud-hosted data solutions.
Mauricio is a Senior Data Engineer at Silicon Valley Data Science. He has experience working with distributed storage and processing systems such as Hadoop, Spark, Cassandra and related tools in the ecosystem; application and web services development in Spring Java and Python; and designed and built cloud technical architectures in AWS and NTT. Mauricio has deployed models into production built using Spark, R, Impala, Hive, and other tools and works to bridge the gap between model development and deployment. Prior to joining SVDS, Mauricio was a technical architecture manager working in Accenture’s R&D group and Big Data practice. He managed a team of data scientists and engineers to build a web-scale recommender and network analytics streaming services on cloud infrastructure and presented the work at Strata and DataStax NYC* conferences. He was also a main developer in Accenture’s Cloud Platform which is used in over 30 client solutions and over 1600 managed servers. He has experience working with clients in the retail, healthcare, and banking industries.
Mauricio holds a Masters of Science in Computer Engineering from the University of Florida.
Deploying a model without a rigorous process in place has consequences. We go over techniques for successful deployment and management.
This post will show architects and developers how to set up Hadoop to communicate with S3, use Hadoop commands directly against S3, use distcp to perform transfers between Hadoop and S3, and how distcp can be used to update on a regular basis based only on differences.
While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.
We’ll be in Boston covering a variety of topics—from running agile data teams, to visual storytelling with data. Let us know if you’ll be there, or sign up to receive all our slides.
With a diverse background in technology strategy and solution architecture, Bryan is passionate about helping clients solve their biggest technology problems. He brings experience from several different industries to help clients develop strategies for how to best utilize their data and build a strong competitive advantage.
Bryan brings together a unique background of technology strategy and solution architecture, spaces where he has advised multiple clients in many different industries. Prior to joining SVDS, Bryan was the operations lead for Accenture’s Digital Customer initiative within Accenture Technology Labs. In this role, he developed strategies and advised clients on how to leverage big data architectures and real-time analytics technologies in order to develop more personalized and engaging experiences for retail customers. Prior to working on the Digital Customer initiative, Bryan helped to lead the build-out for some of Accenture’s initial assets for analyzing unstructured text data. He was also one of Accenture’s initial resources in cloud computing, where he designed the new technical architecture for a global energy client’s fuel retail division and authored a cloud and virtualization migration strategy for a large state university.
Throughout his experiences, Bryan has earned a patent in cloud computing technology and has built skills around: AWS, Python, Hadoop, Cassandra, and Tableau.
Brian holds a B.S. in Electrical Engineering from the University of Illinois.
What is changing in data governance, how these changes can help you get more value out of your data, and what you can do to adapt to these changes.
In this post, we’ll look at the components that make up a modern data strategy, and how they work to bring you business value quickly
In this post we look at the three components you can use to determine your data strategy’s ROI.
Companies that demand immediate technical results often don’t want to take the time to develop a data strategy up front, but it will actually accelerate business value for your company. In this post, I’ll describe how these components fit together and work to bring you business value quickly.
Following a postdoctoral position in astrophysics, Jonathan is a sought after speaker on computing and astronomy. He is excited by the application of machine learning and statistical techniques to industry problems and has developed novel data analysis techniques.
Jonathan has a diverse range of interests and is excited by the challenges and possibilities in the field of data science and engineering. He comes to SVDS after participating in the Insight Data Science Fellowship to prepare for transitioning from academic research into the tech industry. His Insight project was an art auction pricing prediction analysis. Before Insight, Dr. Whitmore completed an astrophysics postdoc at Swinburne University in Melbourne, Australia, where his research focused on trying to determine whether the physical constants of the universe have changed over cosmological times. This research sent him to world-class observatories to observe on the largest optical telescopes in existence. Further, he has a long-standing commitment to the public understanding of science and technology, most notably by his co-starring in the 3D IMAX film “Hidden Universe” which is currently playing in theatres around the world.
Jonathan received his PhD in physics from the University of California in San Diego, and graduated with a Bachelor of Science from Vanderbilt with a triple major in physics, philosophy, and mathematics.
This past August was the first JupyterCon—an O’Reilly-sponsored conference around the Jupyter ecosystem, held in NYC. In this post we look at the major themes from the conference, and some top talks from each theme.
We summarize the objectives and contents of our PyCon tutorial, and then provide instructions for following along so you can begin developing your own EDA skills.
In this post, we’ll be talking through a few tools that help make data science teams more productive.
Senior Data Scientist Jonathan Whitmore talks about experimentation and agility, based on his time at the unconference.
We present some best practices that we implemented after working with the Notebook—and that might help your data science teams as well.
Data Scientist Jonathan Whitmore has just released a screencast tutorial for Jupyter Notebooks.
We present here some best-practices that SVDS has implemented after working with the Jupyter Notebook in teams and with our clients.
PyCon is the largest annual Python conference, and will be in Portland, OR this year. Our team will be there, talking about exploratory data analysis. Let us know if you’ll be there, or come say hi at our tutorial.
We’ll be in Boston covering a variety of topics—from running agile data teams, to visual storytelling with data. Let us know if you’ll be there, or sign up to receive all our slides.
Data Scientist Jonathan Whitmore will be attending Astro Hack Week, please find him to say hi if you’ll be there.
We’ll be at PyData, looking to learn more about how data scientists are using Python. Have a cool story, or questions of your own? Be sure to come find us.
SVDS presents two sessions at the Open Source Convention: A tool-agnostic tutorial for those who want to elevate the look and feel of their data visualizations; and a talk that will explore some overall best practices for sharing IPython Notebook code within a data science team.
Because of its flexibility, working with the Jupyter Notebook on data science problems in a team setting can be challenging. We present here some best-practices that SVDS has implemented after working with the Notebook in teams and with our clients.
Richard has been at the cutting edge of big data since its inception, leading multiple efforts to build multi-petabyte Hadoop platforms, maximizing business value by combining data science with big data. He has extensive experience creating advanced analytic systems using data warehousing and data mining technologies.
Richard is an expert in big data architecture, platform deployment, and large-scale data science. Prior to joining SVDS, he led development of a multi-petabyte Hadoop platform at WalmartLabs. The platform included deployment of two separate Impala data warehouses that hosted production and ad-hoc workloads serving hundreds of billions of rows of log and transactional data. The warehouses also included HBase instances setup with active-active replication in two separate data centers serving millions of operations per second on near real-time data feeds from flume. Prior to WalmartLabs, Richard launched the first Hadoop system at Walmart Stores, spanning from idea to multi-petabyte production system starting in 2009. This included proposal to build Hadoop as a complement to the data warehouse and data mining platform then rapidly moving from proof of concept to full secure production deployment enabling customers to perform analysis that could not be done in existing systems.
He has also built several advanced analytics applications including: distributed optimization engine for workforce scheduling; forecasting systems using over ten years of history to predict daily, weekly and monthly sales; transportation route scheduling systems; supply chain optimization systems; price modeling systems; and various data mining efforts.
Richard holds a Bachelor of Science in Mathematics and Computer Science from Missouri Southern State University.
In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.
An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.
Principal Engineers Richard Williamson and Andrew Ray will be on Pepperdata’s webinar panel of industry experts, talking about Spark trends and use cases. Sign up here to attend the webinar, or get the recording.
SVDS Principal Engineer Richard Williamson presents a session on “Leveraging Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine.”
SVDS presents two sessions at StampedeCon: one that examines the benefits of using multiple persistence strategies to build an end-to-end predictive engine; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.
SVDS presents two sessions at Hadoop Summit: one that maps the central concepts in Spark to those in the SAS language, including datasets, queries, and machine learning; and a look at how to choose an HDFS data storage format: Avro vs. Parquet and more.
Several of us will be presenting and we’d love to see you there. Join us for our tutorials and sessions, or come visit us at our booth in the Expo Hall.