
Crossing the Development to Production Divide
In this post we’ll give an overview of obstacles we’ve faced (you may be able to relate) and talk about solutions to overcome these obstacles.
In this post we’ll give an overview of obstacles we’ve faced (you may be able to relate) and talk about solutions to overcome these obstacles.
The Data Architecture Summit provides in-depth education from leading experts specializing in data architecture. We will be there discussing data platform and data governance. Let us know if you you’ll be attending and would like to chat.
In this post, we will discuss how dealing with small files is different if you are using MapR-FS rather than the traditional HDFS installation.
How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.
In this tutorial, we will walk you through some of the basics of using Kafka and Spark to ingest data.
In this post we provide a framework for choosing a data format, and provide some example use cases.
If you are on the path to being a data-driven company, you have to be on the path to being a development-enabled company.
In this post we’ll look at some real world examples of managing headaches while moving to Hadoop.
A quick overview of the motivation behind our instant and repeatable data platform tool.
In this post, we’ll walk you through how to use tuning to make your Spark/Kafka pipelines more manageable.
This post looks at four business analysis capabilities that connect the dots between promising applications of data assets for telecommunications companies.
Building or rebuilding a data platform can be a daunting task, as most questions that need to be asked have open-ended answers. But that doesn’t mean you have to guess and use your gut.
Any technology is only as good as the way in which you use it.
In this revamped classic, Edd looks at the challenges of moving forward with a new architecture, and where you need to start.
In this post, we cover what’s needed to understand user activity, and we look at some pipeline architectures that support this analysis.
In this post, we’re going to go over the capabilities you need to have in place in order to successfully build and maintain data systems and data infrastructure.
In this post, Fausto talks about the characteristics that differentiate data infrastructure development from traditional development, and highlights key issues to look out for.
In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.
We know what it’s like to deal with complex production deployments that cover the gamut from infrastructure upgrades, to feature deployments, to data migrations, where each step threatens to derail the plan. In this post she’ll give an overview of obstacles she’s faced (you may be able to relate) and talk about solutions to overcome these obstacles.
Building or rebuilding a data platform can be a daunting task, as most questions that need to be asked have open-ended answers. This post aims to help.
How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.
While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.
Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.
Today, the currency supply supported by the Bitcoin blockchain is worth four billion dollars. So, what have we learned? There are five essential properties any good blockchain must have.
In this post, we look at how to set up a microservice development environment for success.
In this post, we offer some strategies for effective communication when developing microservice environments.
In this post, we tackle the challenge of maintaining consistency in an environment with distributed development teams.
If you have identified microservices as the best solution to the technical problems you are facing, then consider the following collection of helpful guidelines to help you get started.
Microservices are a popular topic in developer circles, because they are a means of solving problems that have plagued monolithic software projects for decades. Namely, tardiness and bugs, both caused by complexity.
If it’s easy, it’s probably wrong.
It’s clear from the explosion of interest in newer platforms and technologies that the old tools and licensing costs don’t work to meet new business needs.
Since the blockchain is both easily accessible and immutable, it is incredibly useful for other purposes. Issuing a tiny fraction of a Bitcoin (called dust) with embedded data allows anyone to easily store data permanently and publicly.
The Apache Spark big data processing platform has been making waves in the data world, and for good reason.
Who knows what unimaginable technology will exist in the next 20 years based on blockchains?
Hadoop is only beneficial if using it is efficient. Hadoop’s Apache Hive is frequently used to handle ad-hoc queries and regular ETL workloads.
In this post, we walk through an example of using Docker to develop a data acquisition pipeline to ingest mobile app GPS data using Kafka and HBase.
An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.
Databases sure ain’t what they used to be—it takes more than a relational database to put together a modern data architecture.