Themes from JupyterCon 2017

September 7th, 2017

This past August was the first JupyterCon—an O’Reilly-sponsored conference around the Jupyter ecosystem, held in NYC. I attended on behalf of Silicon Valley Data Science (SVDS) and presented a poster. We make extensive use of Jupyter (Notebook, Hub, nbconvert, etc.) in our data science consulting work and love to show our support for open source projects. JupyterCon was one of the best conferences that I’ve been to, and I learned a great deal from the few days that I was there. There were several themes that presented themselves during the conference that I would like to highlight:

reproducible science and collaboration
Jupyter for teaching
future possibilities for Project Jupyter

In this post I will present a number of talks grouped by their themes, with some thoughts surrounding them.

Reproducible (data) science and collaboration

The Jupyter Project grew out of the IPython framework that was started by an academic (Fernando Perez) as an “afternoon hack.” From the beginning, the project focused on how to better use computational tools to solve problems faced by working scientists. This pedigree shows through to today in many ways—reproducibility and collaboration are key concepts in science, and were addressed by a number of talks at JupyterCon.

The following two keynotes spoke at high level about collaboration and reproducibility.

In Data science without borders, Wes McKinney spoke about writing code that can be used in a variety of computing environments, from Python, to R, and so on. More importantly, he discussed how his time is being spent making this goal a reality with Apache Arrow.
Fernando Perez gave a history of Project Jupyter, and tied it into the openness required of science in his talk, Project Jupyter: From interactive Python to open science.

The next two talks get more into implementation specifics.

In Design for reproducibility, Lorena Barba tackled the challenge of reproducibility directly.
In How Jupyter makes experimental and computational collaborations easy, Zach Sailer explained how his collaboration combines the various pieces of the Jupyter ecosystem (what he called orbit) to develop, communicate, and share their science. His slides are available here, with a YouTube video hopefully coming online soon.
I presented a poster based on my previous work on collaboration for data science teams. A number of people stopped by during the poster session to chat about the challenges they face in working on teams and to share ideas for solutions.

Jupyter for teaching

Jupyter Notebooks allow teachers to give students a document that interleaves narrative and description with interactive code snippets and challenges. This suggests an excellent pedagogical tool when properly used. Of course, deploying code that is meant to be altered by students on their own laptops with every conceivable hardware configuration can be a daunting task. A number of talks spoke to how they tackled this challenge.

Watch the excellent talk How the Jupyter Notebook helped fast.ai teach deep learning to 50,000 students by Rachel Thomas (co-founder of fast.ai, a deep learning educational startup). Rachel talked not only about the technical challenge, but the overall philosophical framework for how they approach teaching.
Two educators from the Data Incubator gave a nice talk about their software stack in Teaching from Jupyter notebooks. They spoke to a number of tools that automatically test for changes in the notebooks and how they deploy Digital Ocean boxes to each of their students. They mention at the end that they will probably migrate to using JupyterHub and JupyterLab in the next sessions that they teach.
I wasn’t able to attend Managing a 1,000+ student JupyterHub without losing your sanity, but I eagerly await the YouTube video.

The future of Jupyter/JupyterLab

What’s next for the Jupyter Project? Through the conference, the message was JupyterLab, a new frontend to many of the tools that exist in the Jupyter ecosystem. JupyterLab was demoed in tutorials and talks throughout, and the newest version (0.27) was released the first day of the conference.

What this means is that Jupyter Notebooks aren’t going anywhere: they feature prominently within JupyterLab. Having played around with earlier versions of JupyterLab, I was very happy with the newest release as it feels like it has come a long way. A few talks from the Jupyter team demonstrated what JupyterLab offers.

I recommend looking at JupyterLab: The next-generation Jupyter frontend when it comes out on YouTube. The core developers of the Jupyter Project demonstrated the newest version of JupyterLab. They did a nice job selling the improvements over a simple Jupyter Notebook server.

Other future thoughts centered around deploying something interactive from Jupyter so that other users could gain insight from some analysis.

Building interactive applications and dashboards in the Jupyter Notebook gave a nice overview into the details of turning a Jupyter Notebook into a dashboard that could be used by others.

Wrapping up

Finally, I want to highlight a talk that doesn’t really fit any of these themes, but simply blew me away.

A billion stars in the Jupyter Notebook by Maarten Breddels. Keep an eye out for the video, but let me assure you that it’s far more than simply plotting a billion stars. The talk demonstrated the amazing capabilities of several visualization libraries; from the conference description: “Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.” Maarten fully delivered on these promises!

Overall, an excellent conference and I learned a lot. Were you there? Tell us about your favorite sessions in the comments.