Architecture

Overview

This diagram shows the relationship between all major components used in the delivery of content via the CDN.

digraph {
    ranksep="1.4";

    # These are arranged and labelled to communicate the
    # sequence of events when a request is processed.
    # Try to keep them in this order.
    client:sw -> controller [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>1</td></tr></table> >
    ]

    controller:sw -> origin_request [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>2</td></tr></table> >
    ]

    origin_request -> dbcontent [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>3</td></tr></table> >,
        dir=both
    ]

    origin_request -> dbconfig [
        # This connection doesn't get a number since the reading of config is not
        # directly tied to the lifecycle of a request.
        dir=both
    ]

    origin_request -> controller:s [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>4</td></tr></table> >
    ]

    controller -> S3 [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>5</td></tr></table> >,
        dir=both
    ]

    controller:se -> origin_response [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>6</td></tr></table> >,
        dir=both
    ]

    controller -> client:se [
        xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>7</td></tr></table> >
    ]

    # publishing tools are mentioned, but do not participate
    # in the request processing.
    # Connection order here is reversed to force the publishing tools to the bottom
    # of the graph, which makes them stand out a bit more.
    S3:s -> exodus_gw:ne [dir="back"]
    dbcontent:s -> exodus_gw:nw [dir="back"]
    dbconfig -> exodus_gw [dir="both"]

    exodus_gw -> exodus_rsync:n [dir="back"];
    exodus_gw -> native_tools:n [dir="back"];
    exodus_rsync -> legacy_tools:n [dir="back"];

    client [label="client"]
    exodus_gw [label="exodus-gw"];
    exodus_rsync [label="exodus-rsync"];
    legacy_tools [label="publishing tools (rsync)", style="dashed"];
    native_tools [label="publishing tools (exodus)", style="dashed"];

    dbcontent [
        shape=plaintext
        fontsize=9
        label=<

            <table border='1' cellborder='1' cellspacing='0'>
                <tr><td colspan='4'><font point-size="14"><b>☁ DynamoDB (content)</b></font></td></tr>
                <tr>
                    <td><b>web_uri</b></td>
                    <td><b>from_date</b></td>
                    <td><b>object_key</b></td>
                    <td><b>content_type</b></td>
                </tr>
                <tr>
                    <td>/content/dist/rhel/server/7/7Server/x86_64/os/Packages/t/tar-1.26-34.el7.x86_64.rpm</td>
                    <td>2020-03-26T01:07:39+00:00</td>
                    <td>8e7750e50734f...</td>
                    <td>application/x-rpm</td>
                </tr>
                <tr>
                    <td>/content/dist/rhel/server/7/7Server/x86_64/os/Packages/z/zlib-1.2.7-18.el7.x86_64.rpm</td>
                    <td>2020-03-26T01:07:39+00:00</td>
                    <td>db8dd5164d117...</td>
                    <td>application/x-rpm</td>
                </tr>
                <tr>
                    <td>/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml</td>
                    <td>2020-03-26T01:07:39+00:00</td>
                    <td>aec070645fe53...</td>
                    <td>application/xml</td>
                </tr>
                <tr>
                    <td>/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml</td>
                    <td>2020-01-22T02:07:20+00:00</td>
                    <td>5d70f436aa013...</td>
                    <td>application/xml</td>
                </tr>
                <tr><td colspan='4'>...</td></tr>
            </table>
        >
    ];

    dbconfig [
        shape=plaintext
        fontsize=9
        label=<
            <table border='1' cellborder='1' cellspacing='0'>
                <tr><td colspan='4'><font point-size="14"><b>☁ DynamoDB (config)</b></font></td></tr>
                <tr>
                    <td><b>config_id</b></td>
                    <td><b>from_date</b></td>
                    <td><b>config</b></td>
                </tr>
                <tr>
                    <td>exodus-config</td>
                    <td>2023-08-04 21:05:40</td>
                    <td>{"listing": {"/content/dist/rhel8": {...</td>
                </tr>
                <tr>
                    <td>exodus-config</td>
                    <td>2023-08-07 21:20:31</td>
                    <td>{"listing": {"/content/dist/rhel8": {...</td>
                </tr>
                <tr><td colspan='4'>...</td></tr>
            </table>
        >
    ];

    S3 [
        shape=plaintext
        fontsize=9
        label=<

                <table border='1' cellborder='1' cellspacing='0'>
                    <tr><td colspan='2'><font point-size="14"><b>☁ S3</b></font></td></tr>
                    <tr>
                        <td><b>key</b></td>
                        <td><b>object</b></td>
                    </tr>
                    <tr>
                        <td>8e7750e50734f...</td>
                        <td><i>[blob tar-1.26-34.el7.x86_64.rpm]</i></td>
                    </tr>
                    <tr>
                        <td>db8dd5164d117...</td>
                        <td><i>[blob zlib-1.2.7-18.el7.x86_64.rpm]</i></td>
                    </tr>
                    <tr>
                        <td>aec070645fe5...</td>
                        <td><i>[blob some repomd.xml]</i></td>
                    </tr>
                    <tr>
                        <td>5d70f436aa01...</td>
                        <td><i>[blob other repomd.xml]</i></td>
                    </tr>
                    <tr>
                        <td>49ae93732fcf...</td>
                        <td><i>[blob some primary.sqlite.bz2]</i></td>
                    </tr>
                    <tr><td colspan='2'>...</td></tr>
                </table>
        >
    ];

    subgraph cluster_0 {
        label=< <b>CloudFront CDN</b> >
        style="rounded";
        controller;
        subgraph cluster_1 {
            label=<<b>exodus-lambda</b>>;
            style="dashed";
            rank=same
            origin_request;
            origin_response;
        }
    }

    subgraph cluster_10 {
        label=< <b>publishing tools</b> >
        style="dashed,rounded";
        exodus_gw;
        exodus_rsync;
        subgraph cluster_11 {
            label="";
            style="invis";
            rank=same;
            legacy_tools;
            native_tools;
        }
    }

    # both DynamoDB and S3 would normally be on the same rank, which makes
    # the diagram way too wide. This ought to help by shifting the config
    # table downwards.
    { rank=same; exodus_gw; dbconfig; }
}
  • Numbered connections represent the sequence of events when the CDN processes a request.

  • For clarity, SHA256 checksums have been truncated (as in 8e7750e50734f...). In reality, the system stores complete checksums.

  • The CloudFront CDN shown in the above diagram may itself be hosted behind another CDN, so client requests may pass through additional layers not expressed here.

Components

client

A client requesting data from the CDN.

This could be dnf, yum, Satellite, curl, a web browser, etc.

CloudFront CDN

The Amazon CloudFront content delivery network.

controller

An abstract component representing the built-in behaviors of CloudFront, such as:

  • basic HTTP request handling

  • serving responses from cache

  • invoking Lambda functions

  • delegating requests to S3

…and so on.

DynamoDB

Amazon DynamoDB NoSQL database service.

DynamoDB (content)

A DynamoDB table which primarily contains mappings between URIs and S3 object keys. Used to look up content. Where multiple matches exist for the same URI, the latest item is used.

For more information about the data contained here, see Schema Reference.

DynamoDB (config)

A DynamoDB table which holds configuration influencing the behavior of the CDN. Examples of configuration include the variables needed to respond to /listing requests, and information on aliases between paths (emulating symlinks between directories).

S3

Amazon S3, Simple Storage Service.

The CDN uses S3 to store the binary objects retrievable by clients. A single bucket is used, configured as the origin of the CloudFront CDN.

One object corresponds to one file which can be downloaded from the CDN; this includes files considered to be content (such as RPMs) and files considered to be metadata (such as yum repo metadata files).

Each object’s key is its own SHA256 checksum, ensuring that content accessible via many paths on the CDN need only be stored once.

For more information about the data contained here, see Schema Reference.

exodus-lambda

A project including Python-based implementations of Lambda@Edge functions for the CDN.

You are currently reading the documentation of this project.

origin_request

A Lambda@Edge function connected to “origin request” events in CloudFront.

This function is primarily responsible for translating the path given in the client’s request into an S3 object key via a DynamoDB query. Assuming the client has requested existing content, this Lambda function will rewrite the request’s URI into a valid S3 object key before returning the request to the controller. The function itself does not request data from S3, nor generate a response directly in the typical case (although it does for some edge cases).

For more information about this function’s behavior, see Function Reference.

origin_response

A Lambda@Edge function connected to “origin response” events in CloudFront.

This function is primarily responsible for tweaking certain response headers before allowing CloudFront to serve the response to clients. For example, caching behavior is influenced by setting a Cache-Control header for certain responses.

For more information about this function’s behavior, see Function Reference.

exodus-gw

A microservice dedicated to writing data onto the CDN, this component exposes an HTTP API for use by publishing tools, and enforces certain policies on published content.

It is the only component permitted to perform writes on DynamoDB and S3 (hence, the “exodus gateway”).

exodus-rsync

A drop-in replacement for the rsync command. This command has an interface which is partially compatible with rsync, but it performs publishes via API calls to exodus-gw rather than using the rsync protocol.

exodus-rsync is not fully rsync-compatible; it is engineered to support specific known publishing tools designed for rsync.

publishing tools (rsync)

Represents tools used by Red Hat to publish content onto the CDN which are designed to use rsync and are mostly unaware of the Exodus CDN architecture. These tools are made to publish to Exodus CDN by replacing the rsync command with exodus-rsync.

RHSM Pulp is an example of a publishing tool using rsync.

publishing tools (exodus)

Represents tools used by Red Hat to publish content onto the CDN which are explicitly designed for Exodus CDN.

These tools don’t need to use the exodus-rsync compatibility layer, and so may have improved performance or an extended feature set when compared with tools using rsync.