Techniques and Technologies: Topology and TensorFlow

Meetup recap  |  December 22nd, 2016

On December 7, 2016, we hosted a meetup featuring Dr. Alli Gilmore (Senior Healthcare Data Scientist at One Medical), and Dr. Andrew Zaldivar (Senior Strategist in Trust & Safety at Google). Despite the drizzle and gloom outside, the atmosphere of the room was bright and buzzing. The lively audience engaged with both speakers throughout their talks, lending the event the feeling of an intimate small group discussion among peers.

Dr. Gilmore, spoke about user experiences that come with the application of machine learning algorithms. Carefully considering the experience that comes with using a particular machine learning algorithm is what will make artificial intelligence more productive and useful to people. She walked through using the unsupervised Mapper topological data analysis algorithm to group similar types of medical claims, discussed the varied reactions of subject matter experts to its outputs, and envisioned a more interactive and satisfying version of this.

Dr. Zaldivar illuminated the path to harnessing TensorFlow‘s powerful capabilities without the complex configuration by using a set of high-level APIs called TFLearn. He showed us how to quickly prototype and experiment with various classification and regression models in TensorFlow with only a few lines of code, as well as how to access other useful functionality in the TF package.


Unsupervised Topological Data Analysis

Gilmore presentation

After an introduction to topological data analysis, Dr. Gilmore summarized the reactions of domain experts to unsupervised clustering algorithms as finding the results difficult to interpret and being underwhelmed by the amount that they can contribute to the grouping process. It may be unsatisfying if interpreting the clusters feels like a guessing game, if there are seemingly duplicate groups, or even if the groups are really obvious. Similarly, it’s frustrating when people want to but can’t contribute their expertise. They may also want to reinforce the model’s results when it does something well, but it’s not necessarily easy to tell the system to do more of a particular thing.

How can we overcome the drawbacks that accompany unsupervised methods? Put a human in the loop! Make using the algorithm a positive and fruitful experience by leveraging what people can do confidently while avoiding things that are hard. For example, users can likely explain what features are relevant (this is what they know and care about), but they may have a difficult time describing how many groups should exist in the data. Let them influence the algorithm on these kinds of terms, perhaps by providing labels for the grouping process via exemplars selection as well as propagating labels through a question–answer feedback loop from machine to human and back. I’m sure every data scientist has imagined the day when they can more colloquially interact with an algorithm to get better results, even if the majority of today’s feedback only involves cursing that falls on deaf ears.

Practical TensorFlow

Dr Zaldivar presenting

Dr. Zaldivar took the audience through the steps required to build a relatively simple convolutional neural network (CNN) using the low-level TensorFlow Python API. It took four slides of code to cover all of the setup, which involved a lot of expertise to implement but demonstrated how specific one can be if needed. He contrasted this with implementing a deep neural network in just four lines of code using functions from the TFLearn module. He recommended running models at the highest level of abstraction first and only dig down into the details if performance is suboptimal. After all, more lines of code means more to debug if something is going wrong.

Peeking under the hood at the underlying architecture, we got a brief overview of the graphical nature of TF networks. At the lowest level, functional operations like multiply and add are nodes in a graph, and tensors (the data) flow through the graph. Operations become larger as TF is abstracted up to TFLearn, which has a similar level of abstraction to Keras. In this high-level API, many TFLearn models should already be familiar to anyone who has used scikit-learn-style fit/predict methods.

Falling somewhere between the core TF API and TFLearn is another module called TF-Slim, and its API can implement a CNN in far fewer lines of code than the initial approach. Slim focuses on larger operations but can intertwine with low-level API to give greater control that TFLearn. With the extensible capabilities of this module, you can also fine-tune a pre-trained model for operating on your own dataset, thereby providing yet another way to get up and running quickly with state of the art networks like Inception-ResNet-v2.

Next steps

You can find Dr. Gilmore’s slides here, and Dr. Zaldivar’s slides here. The decks contain a number of links to resources related to their talks—the interested reader is encouraged to peruse the slides to find gems related to the interactive machine learning field, topological data analysis, logging and monitoring capabilities in TensorFlow, additional built-in neural networks, Jupyter notebook examples, and tutorials. We’ve also put recordings of Dr. Gilmore’s and Dr. Zaldivar’s presentations on YouTube.

SVDS offers services in data science, data engineering, and data strategy. Check out our newsletter to learn more about the company and current projects, and to hear about future meetups hosted at our offices.