Quantcast
Channel: balena Blog
Viewing all articles
Browse latest Browse all 27

Docker Containers for Edge AI: Should You Use Pre-Built or Custom Images on NVIDIA Jetson Devices?

$
0
0

 

In this post we’ll cover the important considerations to make in such a decision, as well as some tips, examples, and best practices if you decide that designing your own images is the way to go. Most of our focus will be on containerized edge AI projects that use NVIDIA Jetson devices, but the overall information is applicable to a wide range of applications.

Docker Container Review

In case you need an extremely quick terminology review, here are the simple definitions of the three main components we’ll be discussing:

  • Container – These are self-contained, runnable software applications or services that execute on the edge device. A container runtime such as Docker Engine, balenaEngine, LXC, Podman or others turn images into containers at runtime.
  • Image – A standalone file that contains everything needed for a container to run. Images can be stored in registries and are read-only. Images are created from Dockerfiles
  • Dockerfile – The instructions and configurations for building an image. A Dockerfile is a text file and can be considered the “blueprint” for an image.

Now that we have that out of the way, let’s look at the pros and cons of using prebuilt images such as those from a public registry or catalog. We’ll start with the pros:

  • Simplicity – When you’re starting your project or just experimenting and want to do a quick test, prebuilt images are ideal – they typically have everything you need to get up and running quickly. For example, if you want to start developing a solution using the machine learning platform TensorFlow, the official Docker image on DockerHub is a great way to get started. 
  • Reliability – Typically the official prebuilt images are maintained by the same development team, ensuring you have the latest version of the software as well as all of the required dependencies in one container. That can save time during initial development.
  • Performance – Generally an official image will be fine tuned by the developer to run about as efficiently as possible. It’s likely a best case environment for running the software and many if not all of its features. (However, this can lead to unnecessary container bloat as discussed below.)
  • Security – Since the prebuilt images you download will become live containers at runtime, it’s important to keep security in mind. You must trust the source of the image because running an untrusted image can be a security risk.

Docker Hub has a set of curated docker repositories known as “Docker Official Images” that promote best practices and can be practically (though not absolutely) assured to be safe. Other images on Docker Hub or similar registries may require some research on the source to determine if they can be trusted. This may not be as crucial for development or tinkering, but if you plan to use a prebuilt image in production, you’ll need to do your due diligence. In that case, you’ll want to at least verify the publisher’s signatures of all source and binary releases, as well as making sure that all dependencies that are included come from trusted sources.

As mentioned earlier, some of the simplicity and performance features of using a prebuilt image can come at a price, as we take a look at the cons:

  • Potentially bloated – In order to satisfy as many general use cases as possible, a prebuilt image may contain a number of packages, plug-ins, features, or libraries that you don’t need. This may be fine for development purposes, but not ideal for production images where it is desirable to have the leanest images possible. These images produce containers that take up as little space as possible and are easier to update over low bandwidth connections – both ideal properties when using containers on edge devices.
  • Limited Customization – In an alternative to the above, a prebuilt image may not include all of the dependencies and packages that you need, especially if your project is highly customized. 

If the prebuilt image you want to use is too large, not trustworthy, or is missing required libraries, the next option to consider is building your own image. Here are the pros for building your own image:

  • Full Control – You get to decide what goes into your image so it can include only what you need for your specific requirements. All dependencies, libraries and configurations can be customized for your project.
  • Minimal Image Size – Since you have full control over your image and install only what you need, the image size should be smaller than a prebuilt one and also take less time to build.
  • Integration with CI/CD – Use your existing CI/CD pipeline to build docker images whenever new changes are committed to your Dockerfiles and continuously push new changes to your users.

We should mention that with full control and customization of your image comes the main downside: it’s more effort on your part. Not only are you responsible for creating and maintaining your Dockerfile, but you also have to look out for any dependency changes or compatibility issues.

In general, then, it makes sense to start with official images for simple projects or experimentation. For complex projects or production environments consider building your own image for more control and optimization. 

As an example, if you need a database such as MySQL or MongoDB you can probably rely on the official prebuilt images because databases typically don’t need a lot of customization (at least customizing that can’t be done with environment variables and config settings.) On the other hand, if you’re deploying a Deepstream application with a lot of dependencies and customization, building your own image could be more appropriate.

Getting started building your own image

There are a number of well-made guides online that can assist you when building your own image; here are a couple that we found:

Docker Build: A Beginner’s Guide to Building Docker Images

Building Images (from the docker docs)

What you will notice in all of these guides is that your Dockerfile must start with the FROM instruction that defines its base image. One rarely builds a Dockerfile completely from scratch, but rather extends the functionality of a prebuilt image which is referred to as the Dockerfile’s base image.

Even though we are building our own image, we are still basing it on a prebuilt image, so all of the caveats about trust and security still apply – your base image must come from a trusted source.  

Choosing a base image

Unless you need the smallest container size possible, it makes sense to use one of the prebuilt images such as TensorFlow, Deepstream, PyTorch, TensorRT, etc. as your base image. It’s a good tradeoff of small size vs. convenience. This is known as “extending” the base image by starting with it and then installing any additional custom software or dependencies you need for your project. Here’s an example of such a Dockerfile that extends the PyTorch image. 

But what if creating the smallest possible image is your goal? In that case you will need to start with a small generic base image and install just what you need in your Dockerfile. Many lean images use Alpine as a base – it’s an extremely stripped down OS. However, at around 5MB, it is so stripped down that you may need to do a lot of work to install everything you need in your image. At around 75MB, an Ubuntu image is usually a better choice for building AI images. 

Building your image

If you’re building your image from a generic base image like Ubuntu, how do you know what to include or where to start? You can often look at how the closest official prebuilt image was built and modify it to suit your needs. For instance, let’s look at the Docker Hub entry for Pytorch, a deep learning framework for Python. 

Ideally, the image maintainer will provide a link to a Github repo that contains the actual Dockerfile (or a script used to make the dockerfile) that you can use as a template for your Dockerfile. In this repo we can see a link to the PyTorch website, which contains a link to their Github repo that houses the Dockerfile.

There is no requirement for images to provide a Dockerfile, so another method to see how an image was created is to click on one of the tags and review the “Image Layers” listed on the left. Each instruction in a Dockerfile creates a new layer which is added to this list. Clicking on a layer displays the command on the right.

Image layers can be helpful if no Dockerfile is available, but they likely don’t contain enough detail to completely reconstruct the original Dockerfile. Another tool that can help determine an image’s construction is the docker history command. You can pull an image to your local machine, run it as a container, and then use docker history --no-trunc <IMAGE_ID> to see an approximation of the Dockerfile used to define the image. (Note that it’s in reverse order!)

Multistage builds

Another container feature you can use to optimize your Dockerfile and container size is multistage builds. These allow you to build your Dockerfile in stages, copying only the files you need from one stage to another. This can be very helpful with AI/ML images. For example, OpenCV is often added to containers in this manner.

How balena helps to manage Edge AI projects

Everything we’ve covered so far works on virtually any container runtime, but balenaEngine, the container runtime in balenaOS offers some exciting additional features geared towards edge devices:

  • True container deltas: bandwidth-efficient updates with binary diffs, 10-70x smaller than pulling layers in common scenarios.
  • Small footprint: 3.5x smaller than Docker CE, packaged as a single binary
  • Minimal wear-and-tear: extract layers as they arrive to prevent excessive writing to disk, protecting your storage from eventual corruption.
  • Uses RAM and storage more conservatively, and focuses on atomicity and durability of container pulling.

Going further

We hope the resources provided in this post help you decide on the best container development process for your needs. If you have any questions, feel free to post them in the comments  below. If you’d like to learn more about balenaCloud, the container-based platform for deploying IoT fleets, send us a message or check out our getting started guide.

Reaching out

If you have further questions about device compatibility or customization, feel free to contact us through this form, e-mail us, or ask in our forums.

The post Docker Containers for Edge AI: Should You Use Pre-Built or Custom Images on NVIDIA Jetson Devices? appeared first on balena Blog.


Viewing all articles
Browse latest Browse all 27

Trending Articles