With Data, Ask “What” Before “How”

Thoughts from Strata + Hadoop World 2016 in New York  |  October 6th, 2016

At SVDS, we encourage our clients to distinguish between what they want to accomplish with data and how they’re going to accomplish it. The what should focus on strategic business outcomes, whereas the how is about the tools, techniques, and architecture required to drive those outcomes. At the Strata + Hadoop World conference in New York last week, we gave two tutorial sessions:

In addition to half-day tutorials, the conference offered an impressive 16 tracks of session talks. A lot of them focused on the tools that everyone is excited about right now: there was a clear emphasis on Spark, and plenty of sessions about Hadoop and its various ecosystem components such as HDFS, as well as some sessions on Google tools like BigQuery and TensorFlow. But I was more curious about the what—the goals people are using data science to accomplish, and the value they’re creating.

Three-dimensional visualizations for interaction

Data visualization is a topic near and dear to my heart, and I’ve been fascinated for years by the ways in which our brains inherently interpret certain kinds of visual information (or cognitive perception). The size of an object and its position in space are the two most significant ways we determine something’s importance, and enlisting these properties to represent the relationships between data points is fraught with challenges.

This is true whether you’re working in two dimensions or three, but the transition from page or screen to the entire environment around you is particularly complicated. It was inspiring, then, to learn from Brad Sarsfield, Principal Software Architect in Data Science, how the Microsoft HoloLens is addressing some of these challenges in his session, “Holographic data visualizations: welcome to the real world.”

I admit that I went in full of skepticism and low expectations, but I left feeling very impressed. His team has clearly put a lot of thought into the entire user experience—not only how to represent data with additional dimension but also how to make sure the user can manipulate it easily. Instead of simply moving a 2D experience into virtual reality and slapping on a bit of depth perception, they’ve carefully curated the entire feature set. Gone are traditional menus, which work great in two dimensions but are incredibly awkward in three. New are the use of motion to help demonstrate relative depth, as well as “handles” for graphs and charts so they can be easily grabbed and rotated or otherwise manipulated in space. Transitions have been carefully crafted to avoid creating a feeling of motion sickness.

While there is still room for additional polish and fine-tuning, this was the first demonstration of 3D data visualization that made me feel an entire new realm of what was possible: interacting with information in the full three dimensions that we’re used to living in every day.

Targeted information for first responders

Bart van Leeuwen is a Data Architect who runs his own company, Netage, but he’s also a volunteer firefighter. His talk, “Smart Data for Smarter Firefighters,” explained how hard it can be to get the necessary information in a timely manner. When the alarm goes off at the station, firefighters are in the truck and on the move right away, and then there may only be a few precious minutes on the road in which to learn a wide variety of information such as:

  • Traffic conditions that may determine the fastest route to the fire
  • Building conditions or structures that may determine their fire-fighting strategy
  • Occupancy or other information about who may be inside
  • Special circumstance such as chemicals stored at the scene that may determine whether typical water hoses can be used

He described how fire trucks are currently stocked with multiple devices to try to provide this information in real time—but also demonstrated how little time there really is to look at them all. And once the fire truck has reached the scene, as he put it: “No one wants to see you stand around and look at an iPad for two minutes. They expect you to get in there.”

The data science challenge here is to turn big data—geographic data, traffic data, blueprint data, residency data, business data, and more—into small data: targeted information that can be absorbed by first responders on demand and on the fly in a matter of moments. This is the challenge that van Leeuwen is working to address, and it’s one that can make a very real difference to everyday people.

Transparent algorithms for crime prevention

Brett Goldstein is a Managing Partner at Ekistic Ventures, “a fund that is actively cultivating a portfolio of disruptive companies that bring new solutions to critical urban problems.” He has previously served as the Commissioner and Chief Data Officer for the Department of Innovation and Technology in the City of Chicago, and before that as Director of the Predictive Analytics Group, Counterterrorism and Intelligence Division, for the Chicago Police Department. His talk, “Thinking Outside the Black Box: the imperative for accountability and transparency in predictive analytics,” focused on the critical issue of bias in predictive policing models.

Goldstein argued that the familiar exhortation to “show your work”—the one we all rolled our eyes at in high school math classes—has never been so important as it is with algorithms, using predictive policing as a prime example. He shared some test results for the Chicago algorithm for classifying the risk-level of crime commission by various people, which showed both false positives for people of color and false negatives for white people at rates over 40%.

The ability to explain how a model works, and to test its accuracy according to different parameters, is critical to achieving public accountability and improving that model. When it comes to something like predictive policing, this algorithmic transparency is the difference in order to ensure both justice and successful prevention of actual crime.

How about you?

These are a few of the sessions that jumped out at me as particularly interesting and significant. Were you at Strata + Hadoop World this year? What jumped out at you? I’d love to hear about your favorite sessions in the comments—or about what you’re doing with data in your own organization.

To see videos from Strata + Hadoop World, check out their YouTube playlist. And if you’re interested in the slides from our tutorials, you can request them here.