In the previous post, we created the hello world application that we’ll be using. Now it’s time to dockerize the app. Dockerizing means to create a Docker image that can be used to run the app.

Docker is a fairly new technology, going back just 4 years. It is similar to a virtual machine, in the sense that it allows running multiple apps inside the same physical host in isolation and it protects the apps from library/dependency conflicts introduced by the physical host. It solves these problems however with a significantly lower resource overhead, as a virtual machine emulates an entire operating system and therefore needs more resources from the physical host.

To create our Docker image, we need a Dockerfile. This file contains instructions that tell Docker how to build our image. Let’s see the first version:

FROM node:alpine
EXPOSE 3000
RUN mkdir /app
WORKDIR /app
ADD . /app
RUN npm install --only=production
CMD ["node", "index.js"]

That’s quite a lot for a first draft, but let’s take it step by step.

Docker images can be built on top of existing images. This allows you to build on top of more generic images, which are publicly available, and provide popular requirements. In our case, we need a Docker image that already supports nodeJS. That’s the FROM instruction. The image is node. We also specify an explicit tag (similar to a version) to use: alpine. A Docker image can have multiple tags (versions if you prefer). The alpine flavor is based on Alpine Linux, which is popular in the Docker community due to its lighter footprint. The nodeJS version is the LTS, which is currently 8.9.

The next instruction, EXPOSE, tells Docker that our app listens to port 3000. Pretty straightforward.

Now the fun begins. We need to put our application code inside the Docker image. First, we create a folder where the app should live with RUN mkdir /app. The RUN instruction runs commands during the build phase of the image. Our image is based on Alpine Linux, so we can run any command that is bundled in that OS. mkdir /app will create a new folder /app in the filesystem of the image. It’s very important to understand that these commands are being executed inside the image.

The next instruction, WORKDIR, defines the working directory (or current directory if you prefer) as /app. Subsequent instructions will take this into account.

Now we add the code of the application into the Docker image with the ADD instruction. It adds everything from the current directory of the host (the computer we’re using to build the Docker image) into the /app directory inside the Docker image.

Next step, install our npm dependencies. Our image is based on node:alpine, therefore the npm command is available. We just run it to install only our production dependencies. Dev dependencies shouldn’t be bundled in a production image, because they aren’t needed.

Last part: define what the image should do when someone tries to run it. That’s the CMD instruction. It says the image should run node index.js (remember that the working directory is set to /app).

That’s quite a lot for a small Dockerfile, but it can be improved further. More on that in a second.

To build this image, run:

docker build -t blog-helm .

This will build a Docker image named blog-helm and it will use the Dockerfile found in the current directory.

When the image is built, you can run it with:

docker run -p 3000:3000 blog-helm

Notice that even though we defined that the app listens at port 3000, we still need to explicitly map that port. At this point you can try http://localhost:3000/ and see the hello world message again, only this time coming from the Docker container (Docker container: a runtime instance of a Docker image).

How can we further improve this build process? First of all, we can avoid sending unnecessary files to the Docker daemon during build time. This can speed up the process. This is done by another file, .dockerignore. Similar concept as a .gitignore file, it contains filename patterns that should be excluded. In our case we can have a simple .dockerignore file:

node_modules

this small change will improve the build time of the image (at least locally, where we have a node_modules folder lying around).

An even more interesting optimization has to do with how Docker caching works. Every instruction in a Dockerfile is creating behind the scenes an intermediate image. Docker is smart and it is able to reuse intermediate images if it can. In our example, the first four instructions are not affected by files on the outside world, so Docker will happily cache and reuse them. It makes sense after all: the intermediate product of those instructions is a Docker image, based on nodeJS, listening to port 3000, with a folder /app as working directory. Nothing specific to our app.

The next instruction adds our code into the image. This is specific to our app and it has to do with our code. Any code change we do will invalidate the cache. Once the cache is invalidated, the next steps will also need to be re-run. Our next step is a time consuming one: installing npm dependencies.

Therefore, in our current setup, if we modify index.js (or add a new css file, whatever), we invalidate the cache, and Docker will need to install npm dependencies again. That’s just a waste of time. We should only re-install npm dependencies if package.json has changed.

Well, we can change our Dockerfile to add the code in two steps: first, only add package.json (and its buddy package-lock.json of course). Then, install npm dependencies. And only after that, add the rest of the code. Something like this:

FROM node:alpine
EXPOSE 3000
RUN mkdir /app
WORKDIR /app
ADD package.json /app
ADD package-lock.json /app
RUN npm install --only=production
ADD . /app
CMD ["node", "index.js"]

This optimization is a life saver, as the costly part that installs npm dependencies will only be run when package.json changes (browse code).

That was a lot for this post, but we’re not quite done with Docker yet. We dockerized the app, but we forgot to test it (our testing range consists only of linting for this example app). We could just run npm run lint, but when we are configuring the CI server, we need to make sure it has the node version we want. Another team might need a different node version. Another team might need more and more dependencies, risking conflicts and increasing server management work. Well, we can also dockerize our build environment. The only requirement for the CI server will be that it needs to run docker. Each team then can define its own requirements independently. More on that on the next post, where we’ll switch to looking things from the CI server’s point of view.