To apt/yum update or not to update

I wish to share packer templates for people to build AMI’s for their own VFX rendering. Is it considered best practice to run yum / apt update on ami builds?

I find this step to be prone with error depending on the month, and is it also possible that it opens us up to potentially more vulnerabilities or less, I am unsure, but I’d like to hear what others think.

If I avoided it, I think workflows would likely be more reliable, however so much documentation relies on the update being performed that it can be difficult to avoid. For example even installing python-pip on an ubuntu 18 base ami will error without doing it.

For what it’s worth, I will share my experience with this, perhaps it will help inform your decision.

We maintain a “common” image upstream that all workloads are built off. This is where we apply automatic updates, through a pipeline with tests associated. It is true that sometimes applying automatic updates can result in flaky bakes - either the process takes too long and subsequent tasks time out, or there is some broken package which breaks everything. Discovering this during a deploy pipeline of an actual workload is unadvisable, so we (the SREs) like to discover it before the developers do.

The packer templates which are shared with people, as you want to do, use these base images and do not need to run any updates.

This involves taking the base AMI we want to build off (usually from the marketplace), then applying our desired configuration to it, via a well-maintained, semantically versioned and peer-reviewed Ansible role.

The role itself has an Inspec profile associated with it, the logic being that “if you apply this role to any image, the resulting state should be the following”, then we make assertions about the state of the image. One of those assertions for example is “there are no packages with vulnerabilities present”. It could just as well be “there are no updates available for the packages on this image”.

The only thing which typically changes for a given version of that role is the initial state (base AMI) and the “layer” of package changes. If anything breaks in the update: * task, the packer build fails and results in a failed job in our CI system, giving us something to in and check on, address or re-run when culprit upstream has been resolved. In all of these cases, the people we serve are not impacted by this. When the build passes again, the result is a new image with the same name, which people can automatically consume using "latest": "true" in their packer template.

So yes - you should deliver clean, secure images, but it’s better to do it behind the scenes in a separate process that transparently delivers known-good states to the people who will use them.

2 Likes

Thats a great answer, thanks!

While reading, you made me wonder about another perhaps useful thing to do - is it possible to take the list of packages that would be updated to via yum or apt, and commit that list into version control? Then instead of running apt/yum update directly, we would update to the explicitly defined package versions in the list?

Building a common image where you run yum/apt update is definitely good, but I figured this method in addition would also make it easier to identify changes, and potentially make any given update more reproducible too.

This can become tedious, but it’s a good goal to have. Maintaining white/black lists of packages can also become a bit scary when you start including dependency chains. Scanning tools like Trivy can help provide insight into what you are actually including in the image, and whether they are contributing to your attack surface. This will also help you prioritise what packages to add to a blacklist; if you’re using Ansible you can have a task that does

- name: Ensure things we don't want are not present
  package:
    name: " {{ blacklist_packages }}"
    state: absent

(where blacklist_packages is a list)

The list can even be built dynamically from the output of Trivy.

The whitelist on the other hand is a good idea, since these would provide you with the functionality of the image. Including versions can be done, so as to pin them:

---
# defaults/main.yml
required_packages:
  - name: thing-i-want
    version: good.version

In the playbook:

---
#playbook tasks
- name: Ensure things I want are present
  package:
    name: "{{ item.name }}=={{ item.version }}"
    state: present
  loop: "{{ required_packages}}"

This has the advantage of being very explicit, but in my experience is very difficult to maintain and can make your git history quite noisy. It’s better to rely on the upstream package maintainers and list ask for “latest”.

1 Like