Posts Tagged ‘Architecture’

Crossing the Development to Production Divide

In this post we’ll give an overview of obstacles we’ve faced (you may be able to relate) and talk about solutions to overcome these obstacles.

Data Architecture Summit 2017

The Data Architecture Summit provides in-depth education from leading experts specializing in data architecture. We will be there discussing data platform and data governance. Let us know if you you’ll be attending and would like to chat.

marbles small files

Handling Small Files in MapR-FS

In this post, we will discuss how dealing with small files is different if you are using MapR-FS rather than the traditional HDFS installation.

Space Shuttle Problems: Long-term Planning Amid Changing Technology

How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.

Data Ingestion with Spark and Kafka

In this tutorial, we will walk you through some of the basics of using Kafka and Spark to ingest data.

How to Choose a Data Format

In this post we provide a framework for choosing a data format, and provide some example use cases.

Graphic of a button that is off and one that is on

Realize the Business Power of Your Data with DevOps

If you are on the path to being a data-driven company, you have to be on the path to being a development-enabled company.

Graphic of pipes, in shades of gray

Data Pipelines in Hadoop

In this post we’ll look at some real world examples of managing headaches while moving to Hadoop.

Pile of colorful spinning top toys

Easily Spinning up Data Platforms

A quick overview of the motivation behind our instant and repeatable data platform tool.

pipelines

Making Spark and Kafka Data Pipelines Manageable with Tuning

In this post, we’ll walk you through how to use tuning to make your Spark/Kafka pipelines more manageable.

Four Data Capabilities for Telecommunications

This post looks at four business analysis capabilities that connect the dots between promising applications of data assets for telecommunications companies.

The Data Platform Puzzle

Building or rebuilding a data platform can be a daunting task, as most questions that need to be asked have open-ended answers. But that doesn’t mean you have to guess and use your gut.

Big Data is About Agility

Any technology is only as good as the way in which you use it.

We Need a New Data Architecture: What Next?

In this revamped classic, Edd looks at the challenges of moving forward with a new architecture, and where you need to start.

pipelines screenshot

Building Pipelines to Understand User Behavior

In this post, we cover what’s needed to understand user activity, and we look at some pipeline architectures that support this analysis.

Building Data Systems: What Do You Need?

In this post, we’re going to go over the capabilities you need to have in place in order to successfully build and maintain data systems and data infrastructure.

Understanding Modern Data Systems

In this post, Fausto talks about the characteristics that differentiate data infrastructure development from traditional development, and highlights key issues to look out for.

Building a Prediction Engine using Spark, Kudu, and Impala

In this post, Richard walks you through a demo based on the Meetup.com streaming API to illustrate how to predict demand in order to adjust resource allocation.

Crossing the Development to Production Divide

We know what it’s like to deal with complex production deployments that cover the gamut from infrastructure upgrades, to feature deployments, to data migrations, where each step threatens to derail the plan. In this post she’ll give an overview of obstacles she’s faced (you may be able to relate) and talk about solutions to overcome these obstacles.

The Data Platform Puzzle

Building or rebuilding a data platform can be a daunting task, as most questions that need to be asked have open-ended answers. This post aims to help.

Space Shuttle Problems: Long-term Planning Amid Changing Technology

How can you manage your implementation in a way that allows you to take maximum advantage of technology innovation as you go, rather than having to freeze your view of technology to today’s state and design something that will be outdated when it launches? You must start by deciding which pieces are necessary now, and which can wait.

From Impala to Hive with Love

While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.

Develop Spark Apps on YARN Using Docker

Rather than get bitten by the idiosyncrasies involved in running Spark on YARN vs. standalone when you go to deploy, here’s a way to set up a development environment for Spark that more closely mimics how it’s used in the wild.

5 Things a Blockchain Needs to Succeed

Today, the currency supply supported by the Bitcoin blockchain is worth four billion dollars. So, what have we learned? There are five essential properties any good blockchain must have.

Developing in a Microservice Environment: Part 4

In this post, we look at how to set up a microservice development environment for success.

Developing in a Microservice Environment: Part 3

In this post, we offer some strategies for effective communication when developing microservice environments.

Developing in a Microservice Environment: Part 2

In this post, we tackle the challenge of maintaining consistency in an environment with distributed development teams.

Developing in a Microservice Environment: Part 1

If you have identified microservices as the best solution to the technical problems you are facing, then consider the following collection of helpful guidelines to help you get started.

Evaluating Microservices: Real World Lessons

Microservices are a popular topic in developer circles, because they are a means of solving problems that have plagued monolithic software projects for decades. Namely, tardiness and bugs, both caused by complexity.

The Basics of Classifier Evaluation: Part 1

If it’s easy, it’s probably wrong.

We Need a New Data Architecture: What Next?

It’s clear from the explosion of interest in newer platforms and technologies that the old tools and licensing costs don’t work to meet new business needs.

Dust in the Blockchain

Since the blockchain is both easily accessible and immutable, it is incredibly useful for other purposes. Issuing a tiny fraction of a Bitcoin (called dust) with embedded data allows anyone to easily store data permanently and publicly.

Use Cases for Apache Spark

The Apache Spark big data processing platform has been making waves in the data world, and for good reason.

The Blockchain is Forever

Who knows what unimaginable technology will exist in the next 20 years based on blockchains?

Two Tips for Optimizing Hive

Hadoop is only beneficial if using it is efficient. Hadoop’s Apache Hive is frequently used to handle ad-hoc queries and regular ETL workloads.

Using Docker to Build a Data Acquisition Pipeline

In this post, we walk through an example of using Docker to develop a data acquisition pipeline to ingest mobile app GPS data using Kafka and HBase.

Flexible Data Architecture with Spark, Cassandra, and Impala

An important aspect of a modern data architecture is the ability to use multiple execution frameworks over the same data.

Data Architecture Reading List

Databases sure ain’t what they used to be—it takes more than a relational database to put together a modern data architecture.