
Realize the Business Power of Your Data with DevOps
If you are on the path to being a data-driven company, you have to be on the path to being a development-enabled company.
Combining his expertise of emerging technologies and cross-industry experience, Fausto helps clients architect big data platforms and build data-driven products. He has extensive experience with data platforms, analytical processes, and distributed systems. His work has encapsulated various architectures and techniques including: cloud-based distributed architectures, stream processing, distributed pub-sub, complex event processing, distributed in-memory caching, collaborative filtering, and market-mix optimization. He has experience architecting and developing solutions utilizing a wide range of technologies including: Hadoop, Cassandra, Storm, Kafka, Hive, Pig, and Pentaho.
Prior to joining SVDS, Fausto was a technical lead at Accenture Technology Labs, where he helped lead the development of multiple analytical systems including large-scale recommender systems and real-time predictive analytics platforms. He was also responsible for the redesign of the data management strategy for a proprietary marketing analytics platform.
Fausto has presented his work at industry conferences including Strata + Hadoop World and Cassandra Summit.
If you are on the path to being a data-driven company, you have to be on the path to being a development-enabled company.
In this post, we’re going to go over the capabilities you need to have in place in order to successfully build and maintain data systems and data infrastructure.
In this post, Fausto talks about the characteristics that differentiate data infrastructure development from traditional development, and highlights key issues to look out for.
In this post, we explain why anyone transforming their company into a data-driven organization should care about software development best practices, even if they don’t consider themselves a software company.
SVDS presents two sessions at the Cassandra Summit: a look at the migration of our client Allant’s CDI-keying engine from Oracle to Cassandra; and a how-to on using Cassandra as a platform for building a custom distributed system.
Modern data architectures look radically different as we move towards a new idea of data platforms. During this “ask us anything” webinar we will discuss our experiences building new data architectures and take your questions.
A problem-solver by nature, Heather is passionate about helping organizations leverage data to drive competitive advantage. She draws across a diverse background in business and technology consulting to find the best solutions for her clients’ toughest data problems.
Heather has led a wide range of data science and data engineering projects across a variety of industries including health, financial services, and retail. In particular, she has extensive experience in unstructured data text extraction, data analysis, data conversions, data visualization, and business case development. She also has hands on experience with many data tools and technologies such as Tableau, WEKA, SQL, Java, R, Hadoop, and Pig.
Heather is particularly passionate about facilitating data-driven business decisions, and leverages her background in technology to marry the right solutions with the right business problems. At SVDS, she has led implementation teams to build real-time inventory management systems that serve the eCommerce website at a Fortune-50 retailer, among other projects.
Heather holds a BS in Computer Science from the University of Missouri, where she graduated with highest honors.
In this post we’ll give an overview of obstacles we’ve faced (you may be able to relate) and talk about solutions to overcome these obstacles.
A quick overview of the motivation behind our instant and repeatable data platform tool.
In this post we explore how data is changing the insurance industry, through the lens of auto insurance underwriting.
We know what it’s like to deal with complex production deployments that cover the gamut from infrastructure upgrades, to feature deployments, to data migrations, where each step threatens to derail the plan. In this post she’ll give an overview of obstacles she’s faced (you may be able to relate) and talk about solutions to overcome these obstacles.
The Strata Data Conference is where cutting-edge science and new business fundamentals intersect—and merge. Several of us will be there in September, discussing platforms, strategy, and tools. Let us know if you’ll be attending and would like to chat.
The Data Strategy track of our webinar series focuses on creating and continuously updating your data strategy. Register now!
OSCON is a long-running conference focused on open source technology and communities. We’ll be there talking about our “push button” infrastructure tool.
Enterprise Data World focuses on data-driven business. Several of us will be there this year, talking about data platforms and enterprise data science. Let us know if you’ll be there, or you can sign up to receive our slides.
Several of us will be in Chicago this year, presenting tutorials on data strategy, data platforms, and how to manage data science in the enterprise. CTO John Akred will also be taking part in a panel about how to strengthen your data strategy skills.
Matt comes to SVDS with over 13 years of experience bringing data solutions to large organizations in a variety of leadership and technical positions. Matt’s experience solving difficult problems with unique and innovative solutions is fueled by his passion for speed, efficiency, and value to the customer.
In this post we’ll look at some real world examples of managing headaches while moving to Hadoop.
Ryan has over 13 years of experience creating enterprise applications for giants in both the retail and credit card processing industries. He is an expert in Java development and has served on open source and Java standards committees.
Coming from a background in geophysics and hydrology, Chloe is well-versed in leveraging data to make predictions and provide valuable insights. She has experience working on a wide variety of problems ranging from developing a data strategy for a pharmaceutical company to devising a methodology for performing longitudinal consumer impact studies at a large retail company. With experience in both academic research and engineering, she tackles novel problems and creates practical, effective solutions. She has researched, written, and spoken on the subject of data valuation for both monetization and for making internal decisions within an organization.
Chloe holds a PhD in Environmental Engineering from Stanford University. Her research there focused on developing methods for obtaining hydrologic insights from electrical data taken from the subsurface to better inform groundwater management decisions.
We summarize the objectives and contents of our PyCon tutorial, and then provide instructions for following along so you can begin developing your own EDA skills.
In this post, we will give a high level overview of what EDA typically entails and then describe three of the major ways EDA is critical to successfully model and interpret its results.
In this post, we will look at driving product engagement with behavioral data, as well as building an integrated analytical environment.
The promise of data and analytics for product companies is that they can help you understand usage, and improve your ability to build, deploy, and service products to customers much more accurately and efficiently. In this post, we look at understanding the customer life cycle.
In this post, we use a Jupyter Notebook go over the steps for creating a proof of concept for the image processing piece of our Caltrain work.
In this post we’ll start looking at the nuts and bolts of making our Caltrain work possible: image processing, video analysis, and image recognition.
This article is the first in a series that I will be posting on the topic of thinking about data as an intangible asset, and how to value it as such.
PyCon is the largest annual Python conference, and will be in Portland, OR this year. Our team will be there, talking about exploratory data analysis. Let us know if you’ll be there, or come say hi at our tutorial.
We’ll be in Boston covering a variety of topics—from running agile data teams, to visual storytelling with data. Let us know if you’ll be there, or sign up to receive all our slides.
Join us as CTO John Akred gives a talk on alternative approaches to valuing data within an organization, and Data Scientist Chloe Mawer demonstrates the power of Jupyter notebooks using a real-world train-detection problem. We’ll also present a tutorial on building data pipelines with Kafka and Spark.
Data Scientist Chloe Mawer will be in Portland giving a presentation about our Caltrain research. Our VP of Data Science, Jeffrey Yau, will also be attending the conference. Be sure to find us and say hi!
You can find Chloe’s slides here.
VP of Data Science Jeffrey Yau, along with Data Scientists Chloe Mawer and Daniel Margala, will be presenting on predicting train delays. See more about our train work here.
With a background in theoretical physics research, Harrison brings a broad knowledge of computational and mathematical techniques for solving complex problems. He enjoys finding patterns and building predictive models from all types of datasets.
In addition to having a strong background in mathematics, Harrison is proficient in Python, Java, Scala, SQL, Spark, HBase, and Cassandra. While at SVDS, he built a PDF extractor based on natural language processing and machine learning, and developed a distributed data platform and APIs to execute supply-demand models for a Fortune 100 retailer’s customer-facing inventory management system. Harrison also implemented a Spark job for the retailer which was in production during the 2014 holiday season. He has published his work on these topics at O’Reilly OSCON and Strata.
Harrison holds a PhD in theoretical physics from the University of Illinois at Urbana-Champaign and a BA in physics from Harvard University.
On May 6th, SVDS hosted an Open Data Science Conference (ODSC) Meetup in our Mountain View headquarters. Data Engineer Harrison Mebane and Data Scientist Christian Perez presented on our Caltrain project.
Milenko has extensive experience architecting and implementing software solutions across variety of industries. He enjoys finding simple solutions to complex problems. Above all, he is passionate about getting things done.
With a background in cognitive psychology and neuroscience, Matt has extensive experience in hypothesis testing and the analysis of complex datasets. He is excited about using predictive models and other statistical methods to solve real-world problems.
In this post, we’ll provide a short tutorial for training a RNN for speech recognition; we’re including code snippets throughout, and an accompanying GitHub repository. The software we’re using is a mix of borrowed and inspired code from existing open source projects.
In early December we hosted a meetup, featuring Dr. Alli Gilmore discussing topological data analysis, and Dr. Andrew Zaldivar covering practical usage of Tensorflow.
One might reasonably judge how well the congress reflects the views of the citizenry by examining the proportion of those citizens who think congress is doing a good job.
With a background in computer engineering and visual analytics, Silvia has worked on several projects helping clients explore and analyze their data. She is interested in building and optimizing the infrastructure and data pipelines used to gather insights from various datasets.
Silvia has given multiple talks at data science industry conferences, and is an author on multiple academic papers, including:
Silvia holds an MS in Computer Engineering from Purdue University, and a BS in Computer Engineering from Michigan Technological University.
In this post we provide a framework for choosing a data format, and provide some example use cases.
On April 21st, SVDS hosted the WWCode Silicon Valley chapter in our Mountain View office; we gave a talk titled Working Effectively in Data Science Teams.
Several of our presenters were interviewed at Strata San Jose. If you missed the conference, check out these interviews below to catch up on some of the topics that were on our minds.
It’s easy to become overwhelmed when it comes time to choose a data format. In this post Silvia gives you a framework for approaching this choice, and provide some example use cases.
DataEngConf features talks and workshops aimed at bridging the gap between data scientists, data engineers, and data analysts. We’ll be there, giving tips on choosing the right format for your data.