Docker How-to: Custom Authentication to A Private Docker Registry With NGINX, Lua, and AWS ECR

Background

Our engineering team developed a Docker Container for our application, Kloudless Enterprise, to simplify cluster using industry standard tools like Docker Swarm or Kubernetes.

However, our customers found downloading the container from our web portal was somewhat inconvenient. Users previously had to download the archived image and manually load it into their Docker daemon to use it. There also wasn’t a way to check which images were available without visiting the portal through a browser. To improve the experience, we decided to provide a private Docker Registry that would allow our users to not only pull images, but also query tags and take advantage of other useful features that the Docker Registry provides.

Private Docker Registry Architecture

To reduce our operational load, we use the Elastic Container Registry (ECR) that AWS provides as a managed Docker Registry. This allows us to work with Docker images without having to worry about maintaining the registry service or the underlying storage.

The primary concern is authenticating end-user access to this registry. ECR relies on short-lived auth tokens that are valid for 12 hours. This is problematic since we either have to provision per-user IAM accounts for each user accessing our registry, or repeatedly provide an auth token our app generates from our IAM credentials. Neither of those cases are very desirable, so we thought of an alternative.

Our API provides tokens that authorize our users to access and manage our platform so we leveraged this to have NGINX accept API requests to our Docker Registry from clients that authenticate using our API’s tokens instead, and then replace the tokens with the Docker ECR auth token. The infrastructure is roughly as shown below:

ECR Authentication

It is straightforward to manage the proxy’s access to ECR. Since we are running the server in EC2, we can create an IAM role to read the relevant repository, describe repositories, and provision authorization tokens for ECR:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetRepositoryPolicy",
            ],
            "Resource": [
            "arn:aws:ecr:YOUR_REGION:123456789012:repository/YOUR_REPO",
                "arn:aws:ecr:YOUR_REGION:123456789012:repository/YOUR_REPO/*",
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:DescribeRepositories"
            ],
            "Resource": "*"
        }
    ]
}


The server needs to store and refresh ECR authorization tokens to allow NGINX to perform requests to ECR. A simple cron job that executes every 8 hours handles this process:

#!/bin/bash
: ${AWS_REGION:="YOUR_REGION"}
: ${AWS_ECR_ID="YOUR_ECR_ID"}
# Logging output to syslog instead of spamming cron emails
: ${LOGGER_TAG:="refresh_ecr_token"}
exec 1> >(logger -t "$LOGGER_TAG" -p cron.info)
exec 2> >(logger -t "$LOGGER_TAG" -p cron.err)
# Generating the token
TOKEN=$(aws ecr get-authorization-token --region "${AWS_REGION}" --registry-ids "${AWS_ECR_ID}" --output text --query 'authorizationData[].authorizationToken')
FILE="/etc/nginx/conf.d/ecr_token"
echo "${TOKEN}" > "${FILE}"
chmod go+r "${FILE}



Proxying the Requests

Since our customers only require read access, we can directly proxy the Docker Registry API requests and replace the authentication—after validating the token, of course. We take advantage of the ngx-lua module to handle this within NGINX itself. The OpenResty framework includes this library by default, but it is possible to install it separately as well. The following configuration snippet demonstrates how to safely proxy the Docker Registry API requests:

lua_package_path "/usr/local/lib/lua/?.lua;;";
map $upstream_http_docker_distribution_api_version $docker_distribution_api_version {
    '' 'registry/2.0';
}
init_by_lua '
    -- External library for JSON parsing.
    local json = require("JSON")
    -- External lib for loading the token.
    local aws = require("aws")
    aws.get_ecr_token("/etc/nginx/conf.d/")
';
server {
    listen 80;
    server_name _;
    # AWS internal resolver
    resolver 169.254.169.253;
    # Disallowing client bodies
    client_max_body_size 0;
    location /health {
        return 200;
    }
    # Kloudless API Endpoint for later validation
    location /v1/meta/licenses/ {
        internal;
        set $server 'api.kloudless.com:443';
        proxy_pass https://$server;
    }
    # Docker Registry API Endpoints
    location ~* ^/v2/(?<channel>[a-z0-9_-]*)?(/.*)?$ {
        if ($http_user_agent ~ "^(docker\/1\.(3|4|5(?!\.[0-9]-dev))|Go ).*$") {
            return 404;
        }
        access_by_lua '
            -- Making sure that no modification requests can take place
            local method_blacklist = {"POST": 1, "DELETE": 1}
            if method_blacklist[ngx.var.request_method] then
                ngx.exit(403)
            end
            -- Handle login process. Returning 401 causes docker CLI to prompt user.
            if ngx.var.http_authorization == nil then
                ngx.header["WWW-Authenticate"] = "Basic realm=kloudless"
                ngx.exit(401)
            end
            -- !!! TODO: See the next blog section for Kloudless Auth.
            -- ...
            -- Get the AWS ECR HTTP API token and modify the Authorization header
            -- again using this token, so that upstream requests to the ECR succeed.
            -- The token expires every 12 hours, thus other means are required to
            -- update the token in the file.
            local aws = require("aws")
            local ecr_token = aws.get_ecr_token("/etc/nginx/conf.d/ecr_token")
            ngx.req.set_header("Authorization", string.format("Basic %s", ecr_token))
        ';
        add_header 'Docker-Distribution-Api-Version' $docker_distribution_api_version;
        proxy_pass https://[YOUR_ECR_ID].dkr.ecr.[YOUR_REGION].amazonaws.com;
        proxy_set_header Host "[YOUR_ECR_ID].dkr.ecr.[YOUR_REGION].amazonaws.com";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-User $http_authorization;
        proxy_set_header X-Forwarded-Proto "https";
        proxy_pass_header Server;
        proxy_read_timeout 900;
    }
}


NGINX doesn’t read from the file containing the ECR token on each request to ensure the requests are handled efficiently. Instead, NGINX loads the tokens when the file changes. The following lua file referenced above by require("aws") handles this:

local _M = {}
local ecr_token = ""
local last_mtime = 0
function _M.read_ecr_token(path)
    local f = io.open(path)
    local token = f:read("*all")
    f:close()
    return token
end
function _M.get_ecr_token(path)
    local f = io.popen("stat -c %Y " .. path)
    local mtime = tonumber(f:read())
    f:close()
    if mtime > last_mtime then
        ecr_token = _M.read_ecr_token(path)
        last_mtime = mtime
    end
    return ecr_token
end
return _M


Custom Authentication

The NGINX configuration displayed earlier uses HTTP Basic Authentication to ensure compatibility with Docker command line tools. The developer’s email is the username, while their account’s API token is the password. In the access_by_lua block, NGINX decodes the Basic Auth header, reads the our token, and uses that to perform a request to our API to list Licenses. This validates the token and also provides information on which Docker releases the developer has access to. The TODO comment in the earlier NGINX config contains the following snippet to authorize the requests using NGINX sub-requests:

function has_access(lks_string, channel)
    local lks = json:decode(lks_string)
    if lks["objects"] == nil then
        return false
    end
    for _, obj in pairs(lks["objects"]) do
        if obj["release"] == channel then
            return true
        end
    end
    return false
end
-- Get the login credentials and decode them
local encoded_creds = string.sub(ngx.var.http_authorization, 7)
local decoded_creds = ngx.decode_base64(encoded_creds)
if decoded_creds == nil then
    ngx.exit(400)
end
local split_creds = split(decoded_creds, ":")
local token = split_creds[2]
if token == nil then
    ngx.exit(400)
end
-- Our license endpoint doesn't like docker's Accept header
local org_accept = ngx.req.get_headers()["Accept"]
ngx.req.set_header("Accept", "application/json")
-- Check if the token is valid
ngx.req.set_header("Authorization", string.format("Bearer %s", token))
res = ngx.location.capture("/v1/meta/licenses/")
if not (res.status == 200) then
    ngx.status = res.status
    ngx.say(res.body)
    ngx.exit(res.status)
end
-- Reset the Accept header to its original one
ngx.req.set_header("Accept", org_accept)
if string.len(ngx.var["channel"]) > 0 and not has_access(res.body, ngx.var["channel"]) then
    ngx.exit(403)
end


Load-Balancing

Our setup uses an ELB in front of the proxy server for high availability and easier SSL/TLS termination. This allows us to easily scale up the proxies using auto-scaling groups and handle any individual instance’s failure. It also allows us to provide a much more user-friendly host name over HTTPS such as docker.kloudless.com rather than the long ECR domain name.

Conclusion

That’s it! The configuration above is modular enough that you can substitute any authentication or authorization method you want, including a different web service or a generic database. This private, read-only registry greatly simplifies how our customers get started with our Docker containers and appliance. Feel free to adapt this code to your use case as well.

Learn more here.

 

 

 

 

Top