Musings around a Dockerfile for Jekyll

Musings around a Dockerfile for Jekyll

If you like cool stories about how an engineer, faced with an impossible problem, overcame all odds and solved it, this post is not for you. This is a story of how I spent a non-trivial amount of time, how I hit a couple of walls, and how I nearly came back to square one. Why do I write it?

The first reason is for me: I want to document my journey so if I ever think about trying again in the future, I'll have some arguments against it. Second, in opposition to the actual zeitgeist, you learn a lot more by failing than by succeeding. Writing about one's learning "bury" it deeper into memory. Last but not least, this is not LinkedIn/Facebook/the latest social media where people brag continuously, but my blog, so I do as I want.

A bit of context

Before diving into the problem, we need a bit of context.

My blog uses Jekyll. Generation happens on GitLab: when I push a commit to master, a job clones the repo and launches the build. It's based on a Docker image that I created via a Dockerfile and uploaded on my GitLab Docker registry.

FROM jruby:9.2-alpine                                                #1

ADD Gemfile /builds/nfrankel/nfrankel.gitlab.io/Gemfile                #2
ADD Gemfile.lock /builds/nfrankel/nfrankel.gitlab.io/Gemfile.lock      #2

RUN gem update --system \                                            #3
 && bundle config set clean 'true' \
 && apk update \                                                     #4
 && apk add --no-cache bash fontconfig git ttf-dejavu graphviz \      #5
 && apk add --virtual build autoconf automake g++ make \             #6
 && rm -Rf /opt/jruby/samples \                                      #7
           /opt/jruby/tool \
           /opt/jruby/bin/*.bat \
           /opt/jruby/bin/*.exe \
           /opt/jruby/bin/*.dll \
           /opt/jruby/lib/ruby/gems/shared/gems/*/man \
 && bundle install \                                                 #8
 && apk del build                                                    #9
  1. Start from Alpine with JRuby
  2. Add build configuration files
  3. Update the system gems
  4. Refresh the package manager cache
  5. Install packages required by PlantUML for schema generation
  6. Install packages to build Ruby gems with C-native extensions
  7. Remove unnecessary files
  8. Install Gems
  9. Remove packages installed in step 6 above

Items 5 & 6 deserve a bit of explanation: we need to install two kinds of packages.

I write my posts in Asciidoctor format. For UML diagrams, I use PlantUML via the Asciidoctor Diagram integration. PlantUML requires Java and Graphviz. The latter requires the fontconfig library. Because Jekyll needs a Ruby runtime, JRuby allows me to fulfill both Ruby and Java requirements. Note that after working on the build, I noticed that installing git, bash and ttf-dejavu is not necessary.

Other packages are build-related. Remember that I'm no Ruby expert, so please correct me if you are. Some Ruby Gems come with a C-native extension. Among those, some have a pure Java-based implementation in JRuby e.g. eventmachine and ffi.

Other Gems offers no such implementation. During the installation of Gems with a C-native extension, the extension are built using the platform's tools: make and a C compiler. Among the dependencies I'm using, I happen to have one single such Gem: sassc. It's a thin Ruby wrapper around libsassc, a component that generates CSS out of SASS.

The problems

Every time I change my Gem dependencies, I need to rebuild my Docker image from scratch. I mean - not FROM scratch, my point is that there's a single layer.

114 MB  FROM a736288dc771c2e
 16 MB  apt-get update && apt-get install -y --no-install-recommends   ca-certificates
 18 MB  set -ex;  if ! command -v gpg > /dev/null; then   apt-get update;   apt-get inst
 12 MB  set -eux;  apt-get update;  apt-get install -y --no-install-recommends   bzip2
  27 B  { echo '#/bin/sh'; echo 'echo "$JAVA_HOME"'; } > /usr/local/bin/docker-java-home
125 MB  set -eux;   dpkgArch="$(dpkg --print-architecture)";  case "$dpkgArch" in   amd6
 26 MB  apt-get update && apt-get install -y libc6-dev --no-install-recommends && rm -rf
 39 MB  mkdir /opt/jruby   && curl -fSL https://repo1.maven.org/maven2/org/jruby/jruby-d
  45 B  mkdir -p /opt/jruby/etc  && {   echo 'install: --no-document';   echo 'update: -
2.5 MB  gem install bundler rake net-telnet xmlrpc
   0 B  mkdir -p "$GEM_HOME" "$BUNDLE_BIN"  && chmod 777 "$GEM_HOME" "$BUNDLE_BIN"
 407 B  #(nop) ADD file:e2eab5a061a6ff9e4f0e7405fec10501bbe399d6dbdc6aeb1d8aeeee7b51f39f
2.9 kB  #(nop) ADD file:365c9a1ac7e13720d4adaba19b0c5684fc3db9065992e0da31038e43ac50d82e
484 MB  gem update --system  && bundle config set clean 'true'  && apt-get update  && ap // 1
  1. This is the layer, the bottom (top?) ones are either noop or from the parent image

The resulting image is also huge.

REPOSITORYTAGIMAGE IDSIZE
registry.gitlab.com/nfrankel/nfrankel.gitlab.iolatest0f3d75c9f0f7837MB

Reducing down the image size

I first focused on the image size on how to reduce the image size. Let's check the hierarchy of the JRuby image:

#NameImageSize diffTotal size
1debian:busterRegistry / Source114MB114MB
2buildpack-deps:buster-curlRegistry / Source34MB148MB
3openjdk:11-jre-busterRegistry / Source138MB286MB
4jruby:9.2-jre11Registry / Source67MB353MB

The problem lies in the topmost image(s) that are based on Debian. My idea was to start from a small base image such as Alpine Linux and add the JRuby package. Unfortunately, it fails with the following error message:

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

current directory:
/usr/share/jruby/lib/ruby/gems/shared/gems/sassc-2.2.1/ext
/usr/share/jruby/bin/jruby -I /usr/share/jruby/lib/ruby/stdlib -r
./siteconf20201205-1-rwydvz.rb extconf.rb
mkmf.rb can't find header files for ruby at
/usr/share/jruby/lib/ruby/include/ruby.h

extconf failed, exit code 1

I wanted to install the missing file, so I checked around. GitHub is pretty explicit:

#error JRuby does not support native extensions

I don't know how the initial image manages to build native extensions! I've checked and the above file has the same content. In all cases, that's back to square one.

I'm nothing but creative. If JRuby doesn't work, let's install Ruby and Java separately. This results in a ~25% reduction in image size.

REPOSITORYTAGIMAGE IDSIZE
registry.gitlab.com/nfrankel/nfrankel.gitlab.ioruby-javacbee0061f7e2623MB

It also fails at runtime with the following message:

bundler: failed to load command: jekyll (/usr/local/bundle/ruby/2.7.0/bin/jekyll)
ExecJS::RuntimeUnavailable: Could not find a JavaScript runtime. See https://github.com/rails/execjs for a list of available runtimes.

An easy fix is to add NodeJS to the list of installed packages. The generated image is slightly bigger but at least it works.

REPOSITORYTAGIMAGE IDSIZE
registry.gitlab.com/nfrankel/nfrankel.gitlab.ioruby-java-nodeb53d0699e43b652MB

I then used this new slimmer image to build my site with an empty commit. It took twice the usual time, from 7 minutes to 14 minutes! I tried a second build to confirm, then went back to the previous image.

build-time.jpg

In the end, a smaller image is not worth twice the build time.

Lessons learned

I still managed to add some improvements to the original Dockerfile.

  • Split into 3 layers:

    The Dockerfile organizes itself around one layer for installing necessary packages, one for the bundling of Gems, and the third one for uninstalling unnecessary packages. This allows updating Gems without the need to update the first layer. Since this first layer doesn't change, the diff is smaller, and pushing to the registry is faster. Even though I don't build every day, faster uploads are nice.

  • Install a single package:

    In the original file, I install make and g++ separately. I learned that Debian offers a single package for build-related tasks, build-essentials.

  • Replace ADD with COPY:

    ADD can do everything COPY does, plus download from an URL and extract from an archive. I tend to favor the directive that has the capabilities I need but not more.

  • Take advantage of WORKDIR:

    I noticed that in the original file, I set the WORKDIR but I was still using absolute paths for ADD destination. It's useless and it also makes the file harder to read and more subject to typos. Using path relative to the WORKDIR i.e. . solve both issues.

  • Remove packages:

    One idea to reduce the image size is to remove unnecessary packages and folders. This doesn't slim down the final image at all. Though the file is virtually gone, its layer is still present due to how Docker's filesystem works. There's one benefit, though, is to remove unwanted capabilities e.g. compilation.

  • Update the parent JRuby image:

    More than once, I assumed that tags are immutable e.g. they reference the same image over time. This is false. The original build warned about an illegal reflective access. An explicit docker pull on jruby:9.2-jre11 downloaded the image with the latest 9.2 JRuby did solve it. Note that a new one appeared, you can't win them all.

  • Clean the bundle:

    I initially set the bundle "clean". It removes Gems that were pre-installed on the system and not referenced by the Gemfile. The latest JRuby package warns before doing it:

      Cleaning all the gems on your system is dangerous! If you're sure you want to
      remove every system gem not in this bundle, run `bundle clean --force`.
    

    It made me a bit afraid so I decided to keep all Gems.

Will all of the above steps, there's a slight reduction in the final image size:

REPOSITORYTAGIMAGE IDSIZE
registry.gitlab.com/nfrankel/nfrankel.gitlab.iolatestdee58fb3bfdd784MB

But the true benefit lies IMHO in the improved readability of the Dockerfile:

# docker build -t registry.gitlab.com/nfrankel/nfrankel.gitlab.io .

FROM jruby:9.2-jre11

MAINTAINER Nicolas Frankel <nicolas [at] frankel (dot) ch>

RUN apt-get update \
 && apt-get install -y graphviz build-essential

WORKDIR /builds/nfrankel/nfrankel.gitlab.io

COPY Gemfile .
COPY Gemfile.lock .

RUN gem update --system \
 && bundle install

RUN apt-get remove --purge -y build-essential unzip bzip2 gpg curl wget linux-libc-dev \
 && apt-get autoremove -y

Squashing or not squashing

The latest versions of the Docker daemon allow experimental features. One such feature is the squashing of all layers into a single one during build. It makes the usage of dedicated tools such as squash moot.

To enable experimental features, add the following JSON line to your daemon configuration (~/.docker/daemon.json):

{
  "experimental": true
}

Now, build the image as usual adding the --squash option:

REPOSITORYTAGIMAGE IDSIZE
registry.gitlab.com/nfrankel/nfrankel.gitlab.iosquashabbc150a52f3527MB

With squash, we've saved a lot of space! But the downside is that it's now one single layer that needs to be uploaded each time. Let's dive into it.

With the squashed image:

  Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cmp   Size  Command
    527 MB  FROM 18574a14e20bdb6

 Layer Details ├──────────────────────────────────────────────────────────────

Tags:   (unavailable)
Id:     18574a14e20bdb6e36934fe80f5d5a57e6ea229d3fa03884b87c0181ef3bc5dd
Digest: sha256:be9a011056620f6ee518ec94711d7562d1f1bc30760f121d3d5a98bb1ca85bba
Command:


 Image Details ├──────────────────────────────────────────────────────────────


Total Image size: 527 MB
Potential wasted space: 0 B
Image efficiency score: 100 %

With the regular image:

┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cmp   Size  Command
    114 MB  FROM d57bfcbd0b4209f
     16 MB  apt-get update && apt-get install -y --no-install-recommends   ca-certificates
     18 MB  set -ex;  if ! command -v gpg > /dev/null; then   apt-get update;   apt-get in
     12 MB  set -eux;  apt-get update;  apt-get install -y --no-install-recommends   bzip2
      27 B  { echo '#/bin/sh'; echo 'echo "$JAVA_HOME"'; } > /usr/local/bin/docker-java-ho
    126 MB  set -eux;   arch="$(dpkg --print-architecture)";  case "$arch" in   arm64 | aa
     26 MB  apt-get update && apt-get install -y libc6-dev --no-install-recommends && rm -
     39 MB  mkdir /opt/jruby   && curl -fSL https://repo1.maven.org/maven2/org/jruby/jruby
      45 B  mkdir -p /opt/jruby/etc  && {   echo 'install: --no-document';   echo 'update:
    2.5 MB  gem install bundler rake net-telnet xmlrpc
       0 B  mkdir -p "$GEM_HOME" && chmod 777 "$GEM_HOME"
    299 MB  apt-get update  && apt-get install -y graphviz build-essential  && rm -rf /var
       0 B  #(nop) WORKDIR /builds/nfrankel/nfrankel.gitlab.io
     407 B  #(nop) COPY file:e2eab5a061a6ff9e4f0e7405fec10501bbe399d6dbdc6aeb1d8aeeee7b51f
    2.9 kB  #(nop) COPY file:2550136ae5a41a4abc0e54c09003b1728ed989b7d4552509578338369c76a
    130 MB  gem update --system  && bundle install
    1.4 MB  apt-get remove --purge -y build-essential unzip bzip2 gpg curl wget linux-libc

│ Layer Details ├─────────────────────────────────────────────────────────────────────────

Tags:   (unavailable)
Id:     4068ddc2416c7c4391aa86a9fb792ca6cfecfa8a375fbfe3643d9cc7c975c828
Digest: sha256:1849e79f32a3a144dbd959d2d6b66872452009183cdd7a76439cb8bdf6a437ce
Command:
gem update --system  && bundle install

│ Image Details ├─────────────────────────────────────────────────────────────────────────


Total Image size: 784 MB
Potential wasted space: 189 MB
Image efficiency score: 78 %

In the end, I made the choice to keep multiple layers. It's more efficient to sent the changed layers whose weight amount to ~132 MB compared to the full 527 MB squashed image.

Unsolved question

After those changes in my build file, there's still one question left unsolved. The Debian JRuby image manages to build the SASSC native extension but not the Alpine JRuby one. I wonder why and would appreciate any hint that provides me an answer.

To go further:

Originally published at A Java Geek on December 27th 2020