Data science is an exciting, changing field. Curious minds and enthusiastic investigators can often get bogged down by algorithms, models, and new technology. If we’re not careful, we forget what we’re actually here to do: solve real problems. And if what we do is just theory, what’s the point?
To be relevant and useful, the day to day activities of data scientists must
- prioritize use of technology, so that it produces the best results;
- design technology and products with the consumer in mind; and
- collaborate well with partners and customers.
In short, data science should result in real applications. The reality of this is multi-faceted. One important problem is managing data teams to get to that real world result. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
In this post, I’ll look at the practical ingredients of managing agile data science.
What are agile data science teams and why do we need them?
It’s a fact that data science results are probabilistic and unpredictable. At the start of a project, it can often look like there’s an obvious route from A to B. When you get started, it’s never that simple. Agile teams do away with strict planning and go into projects with a creative mindset; they embrace uncertainty instead of shying away from it.
This comes in handy when a roadblock pops up—traditionally-run data science teams can get stuck deciding on their options, while the flexible agile data science teams are more likely to find a new solution. Unpredictability and the need to adapt quickly to problems doesn’t scare them; it excites them.
At the same time, the agile planning method focuses hard on application to the customer’s problem. Otherwise, it’s easy for us to get lost down the rabbit hole of stringent rules about hypotheses, models, and results. In the latter scenario, we end up producing things that work—that validate our hypotheses—but that have little application to the real world scenario we’re producing them for. Wasting time is not good for us or our customers.
There are some key concepts that underpin the agile method we employ at SVDS. Collectively they provide us with the goals for a project, the top level strategy for investigation, and day-to-day action plans.
- The charter—Why are we doing this project? What outcomes or conclusions do we hope to reach? What does my customer need at the end of this project?
- Investigation themes—How do I gather and understand this data? What can I directly observe? What can I implement to help me understand the data?
- Epics—Break down the investigation themes into one or more work plans.
- Stories—Units of work that make up epics. These are concrete activities that can be completed in a given amount of time.
It’s great to have a method, but it helps to see how it’s used to solve a real problem. At SVDS, we used this method to create a system that tells train riders when the Caltrain is running late to a stop, and its approximate time of arrival. Let’s dive into how that worked.
A Caltrain example
I’ll give a brief overview of our Caltrain work, but if you want to learn more check out our project page. The point of this project, its charter, was to create an app that would tell the user when the Caltrain was running late, and how long it would be until it arrived at a designated stop. The Caltrain system has its own app, but it suffered from being inaccurate, and didn’t tell riders if a train was late, and how late it was. No one likes being late for work, so we wanted to create a solution for them.
The next step was to define the investigation themes, which started with the question: “how do I know the train is late?”
The epics portion included all the smaller questions and tasks required to answer the big questions posited in the investigation themes. Epics included undertakings such as “develop a working model for the Caltrain system under regular working conditions,” and “classify catastrophic events in the system that prevent the regular working model from applying.”
As the epics are broken down into units of work, the stories resulted. Example stories included “can I accurately and consistently use Twitter to find data on the train’s late times?” and “can I identify the direction of the train using video?”
Sprints, standups, and review meetings
Given the breakdown of work into epics and stories, how do you manage its execution and planning? This happens through sprints, standups, and review meetings. You’ll find that different agile practitioners have differing spins on these meetings, but the fundamentals are all similar.
Sprints. Stories are completed during sprints, which is a set chunk of time to work on tasks, typically two weeks, with the goal of producing new results. A sprint starts with sprint planning, where we’ll decide, with customer stakeholders, which epics to start or continue. Data scientists will break these down into stories for that sprint.
Standups. Each day during the sprint, the team gathers in a standup meeting. Here they report their progress, say what they’re going to do next, and coordinate to remove blockers where people are stuck on their work. Standups, as their name suggests, aren’t for long discussions or problem solving. The point is to get information out there quickly, and set up any further discussion. Optionally, a customer stakeholder may attend these standup meetings. Alternatively, we schedule one or two updates with them separately each week. It’s important to keep them closely involved with progress.
Review meetings. The last step in the sprint process is to hold a review meeting, where the team presents and evaluates results. Customer stakeholders also attend this meeting. We present the work, and the group discusses it. Is it good enough? Should we keep working on it? Will it be useful, or should we abandon it now? We typically combine our review meetings, which are an assessment of the work done, with our sprint retrospective meetings, which are an assessment of how the work was done.
This keeps the group from spending long periods of time working on things that won’t actually benefit customers. If the work is incomplete, you discuss how to move forward. If it’s finished, you discuss what you learned from the experience. Agile teams are always learning from previous work.
Agile data science teams work in a way that is adaptable, collaborative, and produces usable results. They subscribe to the idea that data science can be creative and innovative. They embrace the unknown instead of making assumptions, and they don’t waste time beating their head against a wall for things that aren’t working.
Agile teams are the future of data science, the creative teammates who work together to make things that are useful, and answer real world problems. The future is fickle, and we must be flexible to succeed.
Editor’s note: We are grateful for the contributions of Amber McClincy and Edd Wilder-James.