Working Effectively in Data Science Teams

Meetup Recap  |  May 10th, 2016

Women Who Code (WWCode) is a non-profit organization dedicated to inspiring women to excel in technology-related careers. On April 21st, SVDS hosted the WWCode Silicon Valley chapter in our Mountain View office; we gave a talk titled Working Effectively in Data Science Teams. This presentation was focused on how data science teams collaborate and work effectively within their own teams, and with other teams on a project.

During the talk, we highlighted three groups that we’ve often seen collaborating on projects, both at SVDS and in other positions: Data Strategy, Data Science, and Data Engineering. More importantly, we looked at how these groups connect to each other, and how collaboration occurs. Each speaker represented one of the roles: Anjali Samani (Sr. Data Scientist), Chen Huang (Data Strategist), and Silvia Oliveros (Data Engineer).

Women in the audience came from various backgrounds; they included developers, scientists, data scientists, analysts, and students, all with the common goal of learning more about real examples of data science work.

Takeaways

Different Roles in a Data Science Project

A Data Strategist is responsible for bridging the gap (communication, process, approach, etc.) between business and technology. It is up to this person to make sure that everyone in the project is clear on what the data scientists are analyzing, and why it is important to the business. The project will have better outcomes if everyone involved is grounded in the practical needs of the business.

However, Data Strategists don’t work alone; they also gather the input from Data Scientists and Data Engineers to build the roadmap. Data scientists, in turn, look at all the available data and combine their business acumen with their tech knowledge to come up with models that can be used to answer the business questions. Anjali described the process of doing Data Science as similar to the scientific method, which can also be described in the Carl Sagan quote, “: Science is a way of thinking, much more than it is a body of knowledge.”

Finally, none of this would be possible without Data Engineers providing a system to efficiently gather data. In a broader context, the Data Engineer is in charge of architecting, designing, creating, and extending data platforms. This includes, but it is not limited to, designing data pipelines, creating ETL jobs, data acquisition, scaling services, logging, monitoring, and making sure the workflows are production ready.

Data Science Project Phases

Using a scenario based on e-commerce, Chen discussed common questions that businesses would like to answer. More specifically, let’s imagine a scenario where an e-commerce company is about to launch a new product. Brands in the grocery and CPG space spend $300 billion a year to promote new products by means of increased visibility in stores. Presently, among those new product launches, only 8 percent succeed. Since launching a new product is such a big endeavour for a company, it is important that before the launch they understand their customers’ shopping behaviors and patterns for new product launches. Specific questions that the company might have are:

  • Who are my customers?
  • What are they buying?
  • How can I get them to buy more?

A Data Scientist can help provide answers to the business questions. Anjali explained that we can focus on several specific aspects in order to understand this company’s current, and target, customers:

  • Sales history data provides insight into spending behaviors: what customers buy, how much, how frequently, and when.
  • Customer demographics will tell us their income, gender, age, and location.
  • Social media posts help determine tastes, preferences, successful marketing companies, and identify influencers that drive sales of a certain product.
  • Behavior analysis helps us understand how often people interact with the e-commerce site so that we can identify pain points, frequency of visits, and cart abandonment rates.

After presenting the use case scenario, we described the Data Project Cycle and examples of what kind of work is performed during each of the stages. The image below shows an example of the data project cycle and how different roles might interact during the process:

wwcode_meetup

As you can see in the image, the three different roles participate during different stages, and they depend on one another to answer the business questions.

Data Science is a Team Sport

Data Science is a team sport in several aspects:

  • It involves collaboration between different disciplines (statistics, machine learning, math)
  • It works better with the collaboration from other roles: Data Strategists, Data Architects, Data Engineering
  • It needs the full support and collaboration from the business owners (stakeholders) and data guardians, as well as other business areas such as legal and compliance so that all the goals align and we can achieve success.

A holistic approach and communication are the keys to success within a data science team.

Next steps/follow-ups

Women Who Code Silicon Valley has a series of Data Science dedicated meetup. You can find more about future events here.

SVDS offers services in Data Strategy, Architecture Advisory, and Agile Build. Check out our newsletter to learn more about the company and current projects.