A Practical Introduction to Container Security
Securing containers is a complex task. The problem space is broad, vendors are on fire, there are tons of checklists and best practices and it’s hard to prioritize solutions. So if you had to implement a container security strategy where would you start?
I suggest to start from the basics: understanding what container security is about and build a model to navigate risks.
Follow the DevOps Life Cycle
Every security initiative is eventually constrained by where security controls can be implemented, so I find practical to just follow the standard DevOps life cycle to surface patterns™ and unlock synergies™.
The DevOps Lifecycle is an infinite iteration of:
Containers are included in the application in the form of a Dockerfiles but are not really part of it. As such they don’t interest the planning and coding phase.
(no, writing Dockerfiles is not coding.)
Every other step is in scope from a security point of view, and I would group them like this:
- Build Time: build, test and release.
- Container Infrastructure: deploy and operate.
- Runtime: monitor.
Why? Every security strategy is only effective if it can be implemented. And every step in each group share a common facility where security controls can be injected without adding much friction:
- Build Time: The CI/CD infrastructure, the container registry
- Container Infrastructure: the container orchestrator
- Runtime: the production environment
Now we have three macro areas we can use as a starting point to do our risk assessments.
Security at Build Time
At build time we have in input a bunch of source files and a Dockerfile, and we get as output a Docker image.
This is where most vendors tend to cluster while trying to sell you the narrative of the importance of scanning container images and calling it a day. Container security scanning is important, yes, but it’s not enough.
- minimize the risk of supply chain attacks.
Container Images Hygiene
First, decide how your images should look like, with a focus on how software dependencies are introduced:
- what base images are developers allowed to use?
- are software dependencies pinned? From where are they pulled?
- are there any labels that are needed to simplify governance and compliance?
- lint the Dockerfile
- follow Docker security best practices when writing Dockerfiles
All of these checks are static and can be implemented for cheap as a step in the build pipelines.
Container Images Scanning
Then we can move into scanning the container image.
Do not scan the image as a step in the build pipeline, instead setup continuous scanning in the container registry.
Why? Vulnerabilities are continuously discovered while your services are not necessarily continuously built. Secondly, builds are additive: every build will generate a new image. So, assuming your container orchestrator trust your registry, every tag you publish can always be deployed and need to be assessed.
(It’s also very slow to scan at build time)
This is where you start thinking about defining patch management and shelf life processes:
- patch management: results from the scanning will feed a patching process that will result in a new version of the image
- shelf life: unpatched/old/unsafe images are deleted from the registry
(next article will be about how to choose a container scanning solution, if you are facing the dilemma right now feel free to ping me)
Container Infrastructure Security
The container infrastructure is comprised of all the moving parts that are in charge of pulling your images from the registry and run them as containers in production.
It’s mostly going to be the container orchestrator – *cough* kubernetes *cough*.
- Avoid platform misconfigurations with security implications
- Minimize the breadth of an attack from a compromised container
Security OF the Infrastructure: Misconfigurations
Container orchestrators are complex, Kubernetes in particular. As of now they fail the promise of DevOps and I think we are still an abstraction layer (or two) away from being a mainstream solution without too much operational overhead.
Every complex platforms is prone to be misconfigured, and this is the part you want to focus on.
You have to threat model your infrastructure to ensure it can’t be abused. This particular thread model should focus on every actor but a compromised container (we will cover that next).
I can’t go into details here, because it really depends on what you are running. For Kubernetes a good starting point for threat modelling is this.
Additionally, if you are not doing it yet, this is also a good argument in favour of using a managed platform: the complexity is reduced if you can leverage a shared responsibility model with your (trusted) provider.
Security IN the infrastructure: Lateral Movements
Next we can talk about what happens when a container is compromised.
You want to minimize the attacker’s ability to move laterally, focusing on these two layers:
- The network layer
- The Identity and Access management (IAM) layer
The network should not be flat. You can start by brutally segment everything into subnetworks and work your way up to a full fledge service meshes.
On the IAM layer work your way toward having a single identity for each container in order to fine tune the authorization grants. This is particularly important in multi tenant platforms: without granular identities it’s impossible to achieve least privilege.
(Google Kubernetes Engine (GKE) has a nifty feature for this called Workload Identity)
Finally, since they are supposed to be immutable, a wonderful strategy would be to reduce the amount of time containers can run: the window of opportunity for attackers to move laterally and gain persistence is as long as the container running lifetime. Continously shut down and spin up your containers.
And this final consideration allow me to smoothly move into the next area.
The last piece of the puzzle is the security of your running workloads. At this point most of the hardening is done and here is when we move into the realm of reactive security controls, the grim land of post-fail.
- is to minimize the impact of an attack from a compromised container.
Detection and Incident Response
The best way to control the impact of an attack is to minimize the time between the breach to when the security team is alerted.
Detecting an ongoing breach is another area where vendors are scrambling to find a silver bullet. There are many approaches, most of them will require side cars and/or daemon sets actively monitoring pod’s traffic and system calls.
Most solutions will provide some value but my advice is to start simple and iterate: use your existing SIEM, ingest your platform, application and audit logs.
Incidents will happen, and it’s fine: have an incident response process.
The first bullet point of every post-mortem should be: “how can we detect this quicker next time?” answering will allow you to identify your blind spots, which you can then use to understand what signals you are missing and what makes sense to buy.
Container security is a broad problem and it is not just about scanning images.
This is the model I built and used to reason about container risks and solutions. It’s very high level and of course, as with every model, it’s not necessarily the right one.
We all know that in reality each infrastructure is a snowflake: so start with your own threat model and use this one as an inspiration.