How to Setup Your JupyterLab Project Environment

Frank Zickert | Quantum Machine Learning

Published in

Towards Data Science

9 min readOct 19, 2018

Create and Customize Your Containerized and Script-controlled JupyterLab Project Environment in a minute.

This post is part of the book: Hands-On Quantum Machine Learning With Python.

Get the first three chapters for free here.

TL;DR:

The JupyterLab-Configuration lets you easily create your JupyterLab configuration that runs JupyterLab in a container and automates the whole setup using scripts. A container is a separate environment that encapsulates the libraries you install in it without affecting your host computer. Scripts automate executing all the commands you would normally need to run manually. For you can review and edit scripts, you get full control of your configuration at any time.

In this post, you’ll see how this JupyterLab configuration works and how you can customize it to cater to your needs.

I am a Data Scientist, not a DevOps Engineer

Dammit Jim. I’m a Data Scientist, not a DevOps Engineer

The list of requirements of a Data Scientist is very long. It contains math and statistics, programming and databases, communication and visualization, domain knowledge, and many more.

”Please, don’t add DevOps to the list,” you think? Ok! How do these processes sound to you?

Create your JupyterLab configuration:

The JupyterLab-Configuration lets you easily create your custom configuration
Download and unzip your configuration
Customize it to your needs (optional)

The following picture shows the JupyterLab configuration in action. Use it with two simple steps:

Execute sh {path_to_your_project}/run.sh
Open localhost:8888 in a browser

And what if I am interested in how this configuration works? How much time does it take to explain it?

Certainly, some hours, Sir. But ya don’t have some hours, so I’ll do it for ya in a few minutes.

The remainder of this post gives you an overview of how this JupyterLab configuration works, conceptually. It explains the building blocks and enables you to customize the configuration to your needs, e.g.

add software packages
add your own Python modules
customize the Jupyter notebook server

Why do I need a JupyterLab-Configuration anyway?

In 2018, Project Jupyter launched JupyterLab — an interactive development environment for working with notebooks, code, and data. JupyterLab has full support for Jupyter notebooks and enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

Provided you run a Unix-based operating system (macOS or Linux), you can install and start JupyterLab with two simple commands:

python -m pip install jupyterlab
jupyter lab

But wait! As simple as the manual setup of JupyterLab may look at first sight as likely it is to not cater to all the things you need to do in your data science project. You may also need:

Jupyter-kernels (e.g. bash, Javascript, R, …)
File converters (e.g. Pandoc, Markdown, …)
Libraries (e.g. NumPy, SciPy, TensorFlow, PyTorch, …)
Supporting software (Git, NbSphinx, …)

Installing these dependencies directly on your computer is not a good idea because you would have a hard time ensuring to keep your computer clean.

What if you had different projects that require different versions of a library? Would you uninstall the old version and install the correct version every time you switch between the projects?
What if you do not need a library anymore? Would you remove it right away and reinstall it, if you discover that you need it after all? Or would you wait until you forgot to remove this library at all?

Installing these dependencies manually is not a good idea, either. You would have no control over all the things you installed.

What if you wanted to work on this project on another computer? How much time and work would it require you to set up the project again?
What if someone asked you for all the third-party-libraries you are using? Among all the libraries you installed on your host computer, how would you identify those you are using in this project?

A Containerized Configuration

A container is a virtual environment that is separated from the host computer. It creates its own runtime environment that can adapt to your specific project needs. It interacts with its host only in specified ways. Any change of the container does not affect your host computer or vice versa. Docker is one of the most prominent and widely used platforms for virtualization of project environments.

The following picture depicts the Docker process that contains two steps: (1) build an image from the Dockerfile and (2) run the image in a container.

Our configuration automates this process in the run.sh -script. This is a shell script (sh or bash) that runs on the host computer. Likewise, this script is your entry point to start your JupyterLab project. Simply open a terminal and run:

sh {path_to_your_project}/run.sh

The Dockerfile is the script that tells Docker how to configure the system within the container. During the docker build-step, Docker creates an image of this system. An image is an executable package that includes everything needed to run an application — the code, a runtime environment, libraries, environment variables, and configuration files.

While Docker supports building up systems from the scratch, it is best practice to start from an existing image, e.g. an image containing an operating system or even a full configuration.

The configuration starts with an existing image. You can find the corresponding Dockerfile in this GitHub-Repository. This image contains the following software and libraries:

Ubuntu 18.04
Python 3.7.0
Pip
Jupyter and JupyterLab
Bash and Jupyter Bash-Kernel
Document (pdf) tools (pandoc, texlive-xetex)
Build tools (e.g., build-essential,python3-setuptools, checkinstall)
Communication tools (openssl,wget,requests,curl)
Various Python development libraries

If you require further software libraries, the Dockerfile is your place to go. Just add a new line after the FROM statement. This new line needs to start with RUN and contains any shell command you may want to execute, usually something like apt-get install or pip install. For example, you can use pip to install some major data science packages with the following statements:

RUN pip install numpy
RUN pip install scipy
RUN pip install pandas

Changes in the Dockerfile become effective during the build-step. If you already started the container, you’ll need to stop it (e.g. use ctrl+c in your terminal) and restart it (sh {path_to_your_project}/run.sh). When you edited your Dockerfile, the build-step may take some time. Docker tries to reuse existing images, it is very fast in subsequent starts when you did not change anything.

If you remove commands from your Dockerfile and rerun the run.sh-script, Docker creates a new image of the system. You do not need to uninstall anything from the system. Because the removed command has never been part of this resulting system. This keeps your configuration clean at all times. You can experimentally install libraries without worrying. If you don’t need them, just remove them. You will get a system image that never installed them in the first place.

The following image depicts how the Dockerfile configures the system: it installs the software as specified in its RUN-commands.

The Dockerfile specifies the configuration of the system

The docker run-command executes this image in a container. Further, it defines how the system running within the container connects to the outside world, i.e. the host computer.

There are two main types of connections: volumes and ports. A volume is a link between a directory at the host computer and one in the container. These directories synchronize, i.e. any change in the host-directory will affect the directory in the container and vice versa. A port-mapping lets Docker forward any request (e.g. HTTP-requests) made to the host computer’s port to the mapped port of the container.

The following image depicts our configuration thus far. The run.sh-script takes care of the Docker build and run steps. Once you execute the script, it creates a running container that connects with your host computer via a file system volume and a port mapping.

The run.sh script automates the Docker process

File system

When you download the files from the Git-Hub-Repository, you will get the following file structure in the .zip file:

{path_to_your_project}/
├─ config/
│ ├─ {projectname}.Dockerfile
│ ├─ jupyter_notebook_configuration.py
│ └─ run_jupyter.sh
├─ libs/
│ └─ nbimport.py
├─ notebooks/
│ └─ …
└─ run.sh

The config-folder contains the configuration files of your JupyterLab project. These files configure the Docker-container, install the software packages, and configure the JupyterLab environment.
The libs-folder contains the software libraries that are not installed as packages but that you add as files, e.g. Python-modules that you wrote yourself in other projects.
The notebooks-folder is the directory where we put the Jupyter-Notebooks.

In the Dockerfile, we set environment variables that point to these directories. For the scripts in the configuration uses these environment variables, you can edit them if you like. Just make sure that the path in the variable matches the actual path.

ENV MAIN_PATH=/usr/local/bin/{projectname}
ENV LIBS_PATH=${MAIN_PATH}/libs
ENV CONFIG_PATH=${MAIN_PATH}/config
ENV NOTEBOOK_PATH=${MAIN_PATH}/notebooks

In the configuration, we map the current working directory ({path_to_your_project}) to the ${MAIN_PATH}-folder in the container. So, any file you put into this directory is available in your JupyterLab project. Vice versa, any file you add or change within JupyterLab (e.g. Jupyter notebooks) will appear on your host computer.

Further, in the EXPOSE command of the Dockerfile, we specify that the configuration provides the JupyterLab port 8888. This port inside the container is mapped to the port of your host computer.

The following image depicts how the container connects its file system and port to the host computer.

JupyterLab-specific Configuration

The final command in our Dockerfile is the CMD-command. It tells Docker that this instruction is something you want to execute whenever you start the container. In the configuration, we execute the run_jupyter.sh-script. This script allows us to do some last-minute preparations, like:

put the jupyter_notebook_configuration.py file at the location where JupyterLab expects it
configure a custom Jupyter-Kernel that automatically loads the nbimport.py Python module

The jupyter_notebook_configuration.py lets you configure the Jupyter notebook server, e.g. setting a password to use for web authentication. A list of available options can be found here.

The custom Python kernel adds the ${LIBS_PATH} to your Python sys.path. This allows you to import any Python module from the ${LIBS_PATH}-folder, e.g. import libs.nbimport. This nbimport.py-module further enables you to import Jupyter-notebooks that are located in the ${NOTEBOOK_PATH}-folder. Whenever you start a Jupyter notebook with a Python kernel, the system does these things automatically for you.

Finally, the run_jupyter.sh-script starts JupyterLab. You can now open localhost:8888 in a browser, where 8888 is the port you specified.

The following image depicts the complete JupyterLab configuration.

Summary

The JupyterLab-Configurator lets you easily create your custom configuration. This JupyterLab-Configuration runs JupyterLab in a container. It separates the environment JupyterLab runs in from the host environment. Thus, you can change the JupyterLab environment (e.g. un-/installing packages) without affecting your host computer or any other project.

This JupyterLab-Configuration automates the whole setup using scripts. These scripts:

Enable you running JupyterLab with a single command (e.g. sh run.sh)
Make your project portable: just move or copy the directory to another host computer
Reveal what is part of your configuration and allow you to review and edit your configuration
Make your configuration part of your sources. You can version-control them like you can version-control your code

The GitHub-repository provides the whole source code.

Using the JupyterLab configuration is very easy:

Execute sh {path_to_your_project}/run.sh
Open localhost:8888 in a browser

NOTE: The ability to run Docker containers is the only requirement of this JupyterLab-Configuration to the host computer. Docker is available on Windows and recently got the ability to run Linux-based containers. Thus, there is no reason, why Jupyterlab should not run on Windows. If you want to try it, you will need to have Docker running and move the docker build and docker run commands of the run.sh to a .cmd file that you can execute on Windows.