Building or rebuilding a data platform can be a daunting task, as most questions that need to be asked have open-ended answers. But that doesn’t mean you have to guess and use your gut.
Archive for the ‘Throwback Thursday’ Category
In this post, we will explore some aspects of the train delay data we’ve been collecting from the Caltrain API.
A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.
Data products are those whose core functions leverage data, be they physical products, software, or services. Edd dives deeper into building data products here.
Data strategy matters to both business and tech. It’s a problem that sits in the center of a Venn diagram, and if we get stuck thinking of those two domains as existing solely in completely separate silos, we’ll lock ourselves out of that key middle ground where the really important problems get solved.
In this revamped classic, Edd looks at the challenges of moving forward with a new architecture, and where you need to start.
We present some best practices that we implemented after working with the Notebook—and that might help your data science teams as well.
Failure is appealing as a stepping stone along the path to innovation, but it’s very scary in practice—especially when you can’t yet see where the path is leading. We’d like to suggest the following five guidelines as a place to start.
While it would be great for everyone if you could just “buy a Hadoop” and skip straight to “Profit!”, in reality there’s a lot of work involved, and 95% of it is unique to your business. How do you determine the steps of a big data project, and ensure it delivers results early? This post talks about where to start.