Docker Containers, Python Virtual Environments, & Virtual Machines
One of the biggest headaches in software development is having different versions of dependency libraries and environments clash and crash. This happens all the time. It’s terrible! You updated your Keras to version 2.2.5 and suddenly your Tensorflow 1.13 is no longer working. Yet tomorrow is your deadline for that product shipment or for that conference abstract submission. Finally you get things to sort of work, but you’re now scared to upgrade to Tensorflow 2.0 in spite of all the improvements you’ve heard it has. And though you did ship that software you are not sure it will work well on your client’s computer. You are holding your breath. Sound familiar? We’ve all been there. But does life have to be so hard for software developers? No it does not … thanks to containers, virtual environments, and virtual machines. Let’s jump right in!
This is a practical hands-on guide and introduction to docker containers, python virtual environments, and virtual machines. In the course of this tutorial we will be demonstrating each of these, so grab your laptop and a cup of coffee and follow along.
Modularity, Isolation, Independence, and Portability are key tenets of good software development. We desire our code to function without clashing with other code, and without depending on codes which may not be available how and when we need them. Additionally we also desire for our code to be portable, enabling us easily move it around as needed. Docker containers, Virtual machines, and Python virtual environments are designed to meet various parts of these needs. They overlap in functionality but are not interchangeable in general.
How Do they all differ?
Docker containers package all the dependencies needed to run a piece of software. Python virtual environments allow separation of sets of 3rd party Python site-packages, i.e. pip installables like Tensorflow, Keras, Matplotlib, Requests, etc . Such that one virtual environment may have Tensorflow 1.13 and Keras 2.1.1 while another may have Tensorflow 2.0 and Keras 2.3.1. Virtual machines are simulations of entire computers and they have their own “operating system,” a guest OS running a top a hypervisor. The computing instance nodes we spin up in the cloud are virtual machines. Let’s dive deeper!
Tenets of Good Software Development
- Modularity
- Isolation
- Portability
- Independence
Docker Containers
Docker containers provide code isolation, independence, and portability. Docker (or related container) is essentially needed if one intends to deploy their application into production. Docker containers have fully prescribed dependencies with which they can be created. These dependencies as well as the instruction on how specifically to create the container are stored in the container’s image.
From Gene (Dockerfile) to Protein (Docker Container): The Dockerfile is analogous to the DNA instruction set with which the docker image is created — where the docker image is analogous to the RNA instruction set with which the container is created — And the Container is analogous to a fully manifest protein built to carry out a specific function.
The image can be thought of as the RNA instruction set with which a protein (the docker container) can be translated (built) in a series of steps. An important part and precursor of the image is called the Dockerfile which is where the build instructions and dependencies are specified. One can think of the Dockerfile as the DNA which encodes the instruction set with which the image is created (transcription in our nucleic acid analogy). The image of a container is portable and can be registered on one of a number of registries/repositories/hubs. Once there, anyone can “pull it” so long as they know its unique name. It can also be registered on any private repository such as Google container registry (gcr.io), Elastic Container Registry (or ECR) from Amazon Web Service, Azure Container Registry from Microsoft, or on the private docker hub.
After installing docker, it’s time to start using it!
Common Docker Commands
Let’s look at a few docker commands. My system is ubuntu-18.04.
The command I used most frequently by far is:
$ sudo docker ps
This reveals any running containers. As you see, I have one running container on my system. The command displays container name, image name, command that is executed upon container creation or start, duration since creation, any ports via which container is bound to system and associated system port, and finally a name for the container. In this case, “friendly_galileo.” The name is randomly chosen by the system if not explicitly specified at the time of container creation.
To see which containers have been created and remain in existence on the system, do the following:
$ sudo docker ps -a
And we see a list of containers on my system. To remove a container, e.g the “keen_bartik” container do the following:
$ sudo docker rm keen_bartik
The container name is immediately echoed back, confirming the container has been removed. Alternatively, one can remove the container using the container ID or any unique set of its initial characters, e.g:
$ sudo docker rm 6728ebfb8a67
or
$ sudo docker rm 672
By repeating the sudo docker ps -a command, we can confirm that that container has indeed been removed.
Of note only a stopped container can be removed from our system. A running container will first need to be stopped before one can remove it. To stop a running container do, e.g:
$ sudo docker stop [Container name, or Container ID]
In our case, after this, re-checking the list of running containers shows there are none:
To start an existing container do, e.g:
$ sudo docker start
After which a reassessment of running containers yields:
To create a new container instance from a given image, do:
$ sudo docker run [Container name or Container ID]
and append any associated flags and arguments associated with the app which will run in that container. For instance, the image named “mighty-server-image” is a customized Tensorflow Serving image designed for creation of a container servers ML models via a specified HTTP/REST port. I detailed customization of the base tensorflow/serving image in this prior tutorial. We use a “-p” flag to specify the port and its binding to our machine. We do:
$ sudo docker run -p 8501:8501 -t mighty-server-image
Which yields a long response log which ends in the following message informing us that our server has been created and is running an HTTP/REST API at localhost on port 8501. See:Virtual Machines
Now rechecking the list of running containers, we should expect to see a new one:
And we do.
If we attempt to create an instance based on an image that is not in our local repository, i.e. based on an image that does not show up in our “sudo docker images” list, docker automatically checks the docker hub to attempt to “pull” that image and execute the build. Calling the “run” directly is equivalent to first calling “sudo docker pull [image name]” followed by “sudo docker run [image name].” To test this out let’s first inspect our images repository then attempt to build a container based on an image not in the repo.
Note that we do not have an image of wordpress in our repository. Now let’s nonetheless try to create a wordpress container. Let’s plan to call our container “Blogospherical.” Do:
$ sudo docker run --name Blogospherical -t wordpress
Notice what happened upon execution of our command. It informed us that it was unable to find the wordpress image locally, and it therefore proceeded to pull it from the docker hub.
Let’s now take another look at our local images repository.
See that the wordpress image has shown up, as has a container we named Blogospherical and based on the wordpress image:
Of note when we create our own images, we are able to host them on the public docker repository (Docker Hub) subject to unique name. That unique name is of the form username/myImage. There are two types of images: official and user. The official images are built and maintained by Docker itself. And at the time of this writing there are about 167 official images including ubuntu, python, wordpress, docker, hello-world, node, mySQL, etc. Images can have tags typically distinguishing various versions. For instance consider the python image. Some sample tags are ‘2.7.0’ or ‘latest’ or ‘slim.’ The resulting images are python:2.7.0, python:latest, and python:slim. Slim is a minimalistic light weight version of python, while ‘2.7.0' and ‘latest’ indicate versions.
To enter into a container via an interactive shell pseudo tty, do:
$ sudo docker exec [Container_name] sh
An interactive shell opens to us and we are inside the container, from where we can execute any of the standard unix commands. For instance, we can use ls
to list contents of the container’s root directory:
We can use mkdir
to create a new directory in the container. Let’s call the directory BLOGOSPHERICAL_BLOGGERS as follows:
We exit the container using exit
After one creates a container from an image, one can make modifications to that container such as we did for Blogospherical. To get an image of the modified container (or any container for that matter) one needs to commit the container. This means to get it’s image and store it in the local images repository. To do so use the following command:
$ sudo docker commit [Container_name] [image_name]
where image_name
is any name you chose to call the image.
We see an sha256 hash echoed upon creation, indicating a reference to the newly added layer encoding the modification. This hash in effect represents the difference between the parent image and the newly created one. To further check we can view images again:
And indeed we see that modified-wordpress-image
was newly created 43 seconds ago. Any container created using this image will have the BLOGOSPHERICAL_BLOGGERS
directory in its root.
Now let us see how to build our own docker image.
The Dockerfile
The Dockerfile is analogous to the DNA instruction set with which the docker image is created — where the docker image is analogous to the RNA instruction set with which the container is created — where the container is analogous to a fully manifest protein built to carry out a specific function. python package indexHere is an example of a Dockerfile:
FROM python:2.7.14
MAINTAINER Stephen G. Odaibo, MD
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
ENTRYPOINT ["python", "hdf5_to_pb.py"]
Anatomy of a Simple Dockerfile:
FROM indicates the parent docker image from which the current one we are building will inherit.
MAINTAINER is the person who is responsible for maintaining this image on the public repository Docker Hub.
COPY is an instruction to copy something, typically from the current directory into the container. In this case we are copying all the contents of the current directory on the host into the /app directory in the container.
WORKDIR declares a directory on the container as the work directory. In this case we are declaring /app as our work directory, after having copied all the files and folders we need into it.
RUN instructs the docker daemon to run some code instruction. This could be anything, in this case we are doing a pip install upgrade followed by pip installation of our python 3rd party library dependencies. The “ --no-cache-dir” is a flag to not create a cache directory. This keeps things light. Recall that docker containers are typically ephemeral, easily created as needed.
ENTRYPOINT specifies the command(s) to be automatically entered into a default shell (bin/sh -c) upon startup of the container. In this case, it will look something like this when executed:
~$ python hdf5_to_pb.py
Or more precisely, bin/sh -c python hdf5_to_pb.py
CMD is what is to be entered into the shell prompt after entrypoint, i.e. immediately after the entrypoint instructions have been executed.
ENV is a flag to declare in the dockerfile the names of environmental variables that can then be used inside of our application running in the container. It along with ENTRYPOINT, CMD, and bash scripting present a powerful set of options for passing arguments to our application running in the container. For instance one may pass a my_shell_script.sh to CMD and that script may call on various environmental variables which have been set.
A segment of the requirements.txt file contains all the python 3rd party site-package dependencies, i.e. the “pip installables” that do not come with standard python. An example of some of its contents may look like this:
To get the requirements.txt file from inside a virtualenv, simply do:
pip freeze > requirements.txt
ENTRYPOINT and CMD: How do they Differ?
ENTRYPOINT executes first into a default shell bin/sh -c. While CMD are passed in afterwards as arguments to entrypoint. Specified runtime arguments overide dockerfile-specified CMD but not dockerfile-specified entrypoint. Entrypoint can only be overridden by explicitly using the flag — entrypoint at docker runtime. Notably, there are more than 10 different ways to accomplish any task of passing arguments to a docker container.
For example, say we have the following in dockerfile:
ENTRYPOINT [“myApp.py”, “x1”, “x2”]At runtime, this will yield~$ myApp.py x1 x2We can get the same result by:ENTRYPOINT [“myApp.py”]CMD [“x1”, “x2"]or byENTRYPOINT [“myApp.py”] and then passing in x1 x2 as arguments to runtime command, e.gsudo docker run -t myImage x1 x2or byENV [ARG_1 ARG2]ENTRYPOINT [“myApp.py”]CMD [$ARG_1, $ARG_2]where the environmental variables $ARG_1 and $ARG_2 have been set before calling docker run. i.e. we’ve done ARG_1=x1 and ARG_2=x1
And many many more combinations and permutations exist.
In summary, the ENTRYPOINTs are executed first and CMDs are passed in as arguments to entrypoint. When ENTRYPOINT is not used, CMD arguments are the first ones into process.
Building the Docker Image
Now let us create a Docker image for an app which will convert a machine learning model in h5 or hdf5 format and convert it to a protobuf (pb) format. You can replace this part with any python script and follow along. I created a directory in which I stored the Dockerfile, our application file — a python script called hdf5_to_pb.py, — the requirements.txt file, and folders from which the application will read and write data. In particular, we will load the hdf5 formatted model into hdf5_model folder and will have it write the pb formatted model into pb_model folder.
From inside this directory we execute the following command:
$ sudo docker build --rm -t new-h2pb-image .
Of note, the --rm flag instructs that once build is completed to remove all intermediate containers which are generated during the build. The period at the end is important and instructs that Dockerfile and other ingredients reside in the current directory. Note that the Dockerfile must be saved without any extension.
Above is a snippet of out application script. We see that it will be looking inside an hdf5_model directory in the container for a saved_model.hdf5 model. Thankfully we included a COPY . instruction in our Dockerfile, so during build we expect the necessary transfers to have occurred and for things to go well. Also, we have set our export path to pb_model directory, one which we also copied into our container. Additionally, we use argparse to pass two parameters into our python script, the height and width of the image on which the models were trained to do inference. The build command yields:
And a check of our images repository reveals successful creation, 3 minutes ago:
Now we create a container based on the newly built image by executing the following command:
$ sudo docker run -dit new-h2pb-image 700 700
Where the 700s are the height and width arguments into our python application. A check of running containers reveals that our new container was created 11 seconds ago:
We can enter the container to inspect it to see if our saved_model.pb was created and saved as intended:
And all looks good:
Now to get our data out of the container and to where we need it on the host system, there are multiple options. One of which in this case, is to simply copy the files from container into host as follows. First we verify that the pb_model folder on our host system is indeed empty:
Then execute the following copy:
$ sudo docker cp adoring_perlman:/app/pb_model pb_model
And we check again to find the import of our pb formatted model was successful:
And now we can stop and remove the container.
Docker is a general purpose software containerization technology which can facilitate various types of tasks. One such task is the serving of machine learning models. For more on this read my previous tutorial on TensorFlow Serving and RESTful APIs. In addition, docker containers are highly compatible with Kubernetes container orchestration platforms and technologies which further facilitate production enterprise scale deployments of software solutions. For more on this as relates to machine learning, see my previous article on TensorFlow Serving, Docker, and Google Kubernetes Engine.
For more on Docker container see the full documentation.
Virtual Environments
Python virtual environments are a mechanism to prevent incompatibility clashes and other forms of conflict that arise from 3rd party python libraries share space to an extent. For instance, and update of tensorflow from 1.13 to 2.0 may result in breakage of any applications that relied particularly on tensorflow 1.13. To avoid this problem one would like to configure environments that have specific signatures as pertains to 3rd party python packages. For instance, one virtual environment could be out TF2.0/Keras 2.2.5/Python 2.7.14 environment, while another is our TF2.0/Keras 2.0/Python 3.6.8 environment, and yet another our TF1.10-gpu/Keras 2.3.0/Python3.6.0 environment. This setup facilitates sandboxing and encourages experimentation by greatly decreasing the risk that we will break anything.
Setting up virtualenv
To setup virtual environment we proceed as follows:
sudo apt update
sudo apt install python3-dev python3-pip
sudo pip3 install -U virtualenv
When I do, I see the following on my screen, but you’ll see something longer if you do not yet have virtualenv and an updated dev and pip package on your system.
Now that we have virtualenv, let us set up a virtual environment. We can call it anything, say venvGood.
$ virtualenv --system-site-packages -p python3 ./venvGood
A folder named venvGood has now been created in current directory. Now, to activate and “enter” our venvGood virtual environment, we proceed with the following command:
$ source ./venvGood/bin/activate # sh
We are now in our venvGood virtual environment and can install any pip installable. First use pip list to take a look at what we have already. You’ll see a long list of packages already. Use pip freeze to get a list properly formatted for say conversion into a requirements.txt file. Let’s install Tensorflow 2.0 in our virtual env. Just do:
$ pip install tensorflow==2.0
And now it shows up on our pip list. Here’s a short cropped-out segment of the list:
Jupyter Notebook Virtual Environments
The datascience community is comfortable with jupyter notebooks, and it is natural to want to work within notebooks in the scope and context of our virtual environments. To for instance work in our venvGood virtual environment in a jupyter note book, let’s first check to see what a notebook’s options normally look like:
Of note, the options on my Ubuntu machine’s jupyter notebook shown below do look different as I have already wired a variety of virtualenvs to it’s backend. Nonetheless take note as we are about to add one more.
Now let us create an ipython kernel for jupyter such that the kernel is associated with a virtualenv on which we will install TensorFlow 2.0, Python 3.7.0, and Keras 2.3.0. Let us create and call this new kernel environment venvGoodKernel. To do so, first create a new directory called venvGoodKernel. And enter into this new directory.
$ sudo mkdir venvGoodKernel
From within this directory, create a new virtual environment of the same name, venvGoodkernel :
$ sudo virtualenv — system-site-packages -p python3 ./venvGoodKernel
$ source venvGoodKernel/bin/activate # sh
Now from inside the venvGoodKernel virtual environment we will perform the following installations:
$ sudo pip install ipykernel$ sudo ipython kernel install --user --name=venvGoodKernel
Now install tensorflow 2.0 and keras 2.3.0:
And let’s now check our notebook kernel options. We see venvGoodKernel:
Create and enter a new notebook and check versions of our software.
We are pleased!
For more on virtualenv and ipython see the full official documentation: ipython documentation. Virtualenv documentation. And for more on python and the vast 3rd party library of packages supporting it and vice versa, visit the python package index, which at the time of this writing had 212,790 active projects
Virtual Machines
Virtual machines are simulations of a full computer. They have a layer called the hypervisor which emulates the operating system and on which the Guest (VM) operating system sits. On top of these are the bins and libraries needed for various software applications to work.
The mainstay of Cloud computing infrastructure is in the form of virtual machines. You may also hear them referred to as Compute instances. The things we probably most often spin up in Google cloud, AWS, Azure, or any of several other cloud offerings.
They often have convenient GUI dashboards as well as command line interfaces. Let’s do a quick walk through initiating a virtual machine in google cloud.
One proceeds by specifying how much resources one needs, What type of processor (GPU or not, what type, and how many?), how much persistent storage? What region and time zone should the VM be located in? etc. The dashboards are fairly intuitive and straight forward across the various platforms, with only a bit of a learning curve. Go ahead and complete the setup and the VM on Gcloud, AWS, and Azure and have fun along the way.
That concludes this tutorial of Docker, python virtual environments, and virtual machines. Hope you found it helpful! If so share, clap
REFERENCES
1) TensorFlow Serving of Multiple ML Models Simultaneously to a REST API Python Client
2) Machine Learning in Production: Serving Up Multiple ML Models at Scale the TensorFlow Serving + Kubernetes + Google Cloud + Microservices Way
3) Docker Documentation
4) Virtualenv Documentation
5) IPython Documentation
6) Python Packages Index
BIO
Dr. Stephen G. Odaibo is CEO & Founder of RETINA-AI Health, Inc, and is on the Faculty of the MD Anderson Cancer Center. He is a Physician, Retina Specialist, Mathematician, Computer Scientist, and Full Stack AI Engineer. In 2017 he received UAB College of Arts & Sciences’ highest honor, the Distinguished Alumni Achievement Award. And in 2005 he won the Barrie Hurwitz Award for Excellence in Neurology at Duke Univ School of Medicine where he topped the class in Neurology and in Pediatrics. He is author of the books “Quantum Mechanics & The MRI Machine” and “The Form of Finite Groups: A Course on Finite Group Theory.” Dr. Odaibo Chaired the “Artificial Intelligence & Tech in Medicine Symposium” at the 2019 National Medical Association Meeting. Through RETINA-AI, he and his team are building AI solutions to address the world’s most pressing healthcare problems. He resides in Houston Texas with his family.