In late 2016 I spoke with with Travis Oliphant, co-founder of Continuum Analytics. We covered many topics, including building a community and balancing enterprise with open source. I’ve broken our conversation up into a series of posts, which will be published over the next several weeks. In this first part of our interview, we discuss breaking down silos, the importance of effectively communicating about cutting-edge technology, and where Anaconda is going next.
What are you most excited about right now? I’m looking for a gut reaction.
A gut reaction. I think that the future is going to be very bright because of the innovation engines that are exploding around open source. The opportunities for machine learning get me excited. I am an old-school, statistical signal processing guy who is also an applied mathematician. When I see machine learning, I see a specialized application of techniques that have been used for decades. Machine learning is re-energizing applied math in the world in a way that you can see. Business intelligence, which used to be dumbed-down math and statistics, can now be driving the education of the world into better solutions.
I’m excited about that trend and I am excited about the integrations that are occurring. Right now, currently, the way that the world is today people end up re-implementing everything in silos, and I see how that can be broken down. I see how we can actually reuse each other’s code and algorithms in a way that was never possible before. This could happen as early as next year. I mean, it is starting to happen a little bit now using the filesystem as an intermediate store—so we aren’t just competing in silos between the Scala, Python, and R worlds. Those worlds can actually start cooperating.
So when you say “we,” do you mean people in tech in general? Or are you thinking about specific initiatives that Continuum is taking part in?
I’m thinking about tech in general, like the world. Certainly Continuum is playing its role in all of that. Towards the end of my previous statement I was specific—there are some things that we are doing around breaking down silos that I’m very excited about. What I’m excited about with Continuum is just how we have a bunch of disparate products in the marketplace and those are becoming unified behind a single data science platform.
I think you are right. We are talking a lot on our side about moving away from silos and democratizing data. As you look at all these changes that are happening, what do you see as the biggest challenge to actually making it real as soon as possible?
Our biggest challenge is actually because of the previous success. Now we need to break down some of the communication barriers that have emerged. We also need to unlearn incorrect assumptions that stand in the way. People have made these assumptions about things that aren’t correct or don’t have to be true but will be made true by the assumptions being believed. Does that make sense?
It does. So it is a people issue?
It is a people issue. It is helping break down the mental models—the world views that don’t have to be the way they are, but are because of a lack of prior art. People have to see something, it is really hard to communicate abstractly. You have to kind of see a thing and then you can abstract around it. So the biggest challenge is that we see how the world could work, and have to communicate that to the world in a way that is consistent, or connected, or somehow brings them from where they are currently thinking into how the world could be.
For example, I was talking with some people from a large company yesterday and realized they are two innovation cycles behind. It was a little bit frustrating I have to admit, because I hadn’t been around somebody who felt like they were from the ’90s for so long. And I was just like, whoa, I don’t even know how to talk. I don’t know how to communicate because your world view and perspective is so foreign to me now. Now some of that—it is a bit arrogant for me to assume it is all that person. I recognize things take a while to change. So even though the ability to do it is there, you have got all kinds of work to do to connect it to the day-to-day of somebody, and the day-to-day to somebody is working in right now.
I was just in an auto parts store, and there was an old dot matrix line printer from the early to late ’70s. Still working and still connected to their point-of-sale device. This is life, you know? People don’t just immediately swap out everything they were doing that was working.
The IRS is still on an old mainframe for managing tax returns. They are currently in the middle of a 10-year project to modernize it, but the challenge of course is they are modernizing to yesterday’s technology. But that is always a challenge. I think education will always remain a big part of the challenge we face. There is a lot of stuff that could be if we can communicate about it effectively. So effective communication and getting the word out is key. We need to help break down barriers caused by other— not incompatible—but under-informed perspectives.
If anyone, what group do you think owns the responsibility to start that conversation in an organization? The business people, the tech people, or anyone who happens to get it?
First, anybody who happens to get it owns the responsibility to tell the story and recruit the people to help them do it better. There are documentation people who are really good at helping, and design and graphic people that can put pictures and videos together. There are obviously public relations and marketing efforts, and any organization can be a part of that.
Here at Continuum, a bunch of different people end up helping that happen over time. It has to be pursued from multiple angles, because you are talking about breaking down barriers, and you are talking about communicating and people are different. Some ways of communicating will resonate with certain people, and other ways will resonate with other groups.
Do you feel that the rate of change in tech right now is on par with the past several years, or do you think things are really starting to speed up?
I think it is accelerating. Certainly the demand for new and improved capability—there is lots of data interconnecting. There is demand to have that data be useful. So it is driving innovation, and at the same time you have got this open source ecosystem that is lit up—kind of an undercurrent of innovation that was always there, but was underutilized. It was sort of hidden behind the corporate structures of how work got done, and old academic structures.
Now there is enough momentum where corporations are funding it, and academics are funding it—of course, now the problem is integration. And the other problem is that there is this really powerful innovation across the board, but it is all scattered. Bringing it together so it can be applied usefully is still a significant effort.
That is what Anaconda is. That’s its whole purpose. The way it’s designed, it is all about recognizing a world full of disparate packages and projects that can be brought together to do amazing things.
I’ve been excited about bringing technology together since I was a grad student. I want to connect the ideas in libraries—these optimization libraries, integration libraries, visualization libraries, and analytics libraries— and pull them together in a way that can be accessible and used. My first big project, the SciPy project, was really just a large distribution of software. You look at what SciPy was—it brought together whole bunch of disparate ideas into a single library. In fact, it should have brought them together into a distribution of separate libraries.
Humans struggle to interact meaningfully on teams that are bigger than 7 to 11 maybe (and really I think the number is 5). The cognitive load of understanding the team members enough sufficiently to make intimate progress—to find out all the different perspectives and concerns and really build those bonds of trust that produce viable results—those teams can’t be very big and have our brains keep up. So innovation is modular.
And SciPy—as the community grew, it needed to be around the modules and those needed to support thousands of people developing, but they couldn’t do it in one place, so really the distribution was the problem. That really was the impetus for Anaconda—recognizing that the problems of SciPy (some of which were carried into NumPy) were really problems of packaging distribution. Anaconda grew up out of that recognition and a desire to try to make things better. And in the process realizing that solves a whole slew of other problems between the parts of any significant software project. Software builds on other software, and these can be brought together by other people to create new artifacts and solutions. This creates a dependency tree—a tree of interconnections—that has to be managed so it can be updated, deployed, and all these things can be reproduced and governed. That is Anaconda; that is what it’s about.
So it starts with just getting all the stuff. The reason I created SciPy in the first place was to help people get the stuff. And a lot of work was spent doing that, and then Anaconda just helps people get more stuff more quickly and in a more repeatable way. But underneath that is the architecture that solves the fundamental problem of developer interconnection.
You talked a bit about the genesis of Anaconda—what do you see it evolving into in the future?
Yeah. It has become the foundation of our platform, and it started with Python but it has evolved to include R, Scala, Julia, Node any package from any language. We call it an open data science platform, instead of just a distribution of Python. So it is evolving into an ecosystem which brings data science together.
What I see as its future is there is a place where free stuff is shared freely and then a place where people can sell to each other as well. You can sell modules because they are easily plugged into this system, and then of course we are trying to make it easy for people to sell—that is a separate problem. Now you are looking at, how do I make it easy for people to buy each other’s work on top of the free stuff? How do you help people interact with each other?
It all comes down to the people, doesn’t it?
It does. That is why markets are so hard to understand and predict, because ultimately markets evolve with groups of people. And you try to understand—you fundamentally solve somebody’s problem and help some people. But then how that interacts with other problems that are being solved at the same time by other people can be hard to predict.
For us, Anaconda solves the heterogeneity problem in an exploding world of innovation. I have written a blog post about this notion that Anaconda helps to normalize enterprise deployment. Or, how does enterprise consume open source? How do they do that? If you don’t use Anaconda, you have to create something like Anaconda. So Anaconda is a place where it can all come together. And for us, the free is free—it will always be free—and then on top of that we believe there is another enterprise layer that is necessary, that maybe open source people won’t create quickly, but enterprises will need to pay for and will pay for. Because that’s a decision a lot of enterprises incorrectly make—rather than amortize the cost of that shared layer across multiple customers, they each independently build it and then pay the cost to maintain it themselves, because they are not going to find an open source community to maintain it. They are going to have to do it, and they just do it rather than have a common layer.
Editor’s note: The above has been edited for length and clarity. In the next installment of this interview, we’ll cover more on how enterprise and open source goals can work together, and advice for building a community.