Implementing Staging and Production “Slot Swaps” with Azure Container Apps

A complete example for implementing this slot swap behavior is available at https://github.com/nshenoy/azure-containerapp-slots-example. Please check it out and feel free to open an issue/PR for feedback.

As a long time user of Azure App Services for app deployments, I’ve gotten accustomed to using staging and production slots as a good practice. Slots provide an opportunity to test new functionality in the staging slot, and then perform a “zero downtime” swap into live production. As I started playing around with the relatively new Azure Container Apps offering, I wanted to see if we could implement a similar zero downtime deployment mechanism that gives the same opportunity to validate before going live. I did come across Dennis Zielke‘s alternative (and excellent) blue/green implementation for Container Apps. However, I wanted to see if there was a different more “supported” way achieve this.

Though deployment slots are not explicitly implemented in Container Apps, there is the notion of a “revision“, defined as “an immutable snapshot of a container app version.” Assuming ingress is enabled, Revisions allow for ingress traffic rules to be set to split traffic between separate revisions. The particularly interesting bit is that revisions can be given labels. Each individual revision is created with an Azure generated unique string, and thus has it’s own URL to hit. However, Revision Labels give a deterministic URL based on the label name and not the revision name. In other words, something labeled as “staging” can always be hit with a URL similar to containerappname—staging.blahblah.azurecontainerapps.io . What’s more – the Azure CLI “az containerapp ingress” command allows for revision labels to be swapped. Now armed with the ability to create revisions, assign revision labels, and swap revision labels, we can now implement something very close to what the Azure App Service provides. We just need a little Powershell and Bicep magic to do the work.

Step 1: Determine the Current “production” Revision Label (if any)

The first main step is to run the Get-ContainerAppProductionRevision.ps1 script to determine if a revision with a production label exists.

# Finding production revision..."
$productionRevision = (&az containerapp ingress show -g $resourceGroupName -n $containerAppName --query 'traffic[?label == `production`].revisionName' -o tsv)

if([System.String]::IsNullOrEmpty($productionRevision)) {
    $productionRevision = "none"
} 

return $productionRevision

The script uses az containerapp ingress show to determine if there is a revision with a “production” label in place. The script either returns the revision name or a value of ‘none’ if the label doesn’t exist, the output of which will become a new environment variable called containerAppProductionRevision.

Step 2: Bicep Template Trickery

The Bicep template is then deployed. And here we have to do some trickery. The first trick is the containerapp_revision_uniqueid parameter:

...
param containerapp_revision_uniqueid string = newGuid()
...
          env: [
            ...
            {
              name: 'containerapp_revision_uniqueid'
              value: containerapp_revision_uniqueid
            }

In order to force a revision-scope change, we set this containerapp_revision_uniqueid params default value to a new GUID with each Bicep deployment.

The next bit of trickery is setting the ingress properties of the Container App:

      ingress: containerAppProductionRevision != 'none' ? {
        external: useExternalIngress
        targetPort: containerPort
        transport: 'auto'
        traffic: [
          {
            latestRevision: true
            label: 'staging'
            weight: 0
          }
          {
            revisionName: containerAppProductionRevision
            label: 'production'
            weight: 100
          }
        ]
      } : {
        external: useExternalIngress
        targetPort: containerPort
        transport: 'auto'
      }

Here we use a ternary operator to switch behavior off of the containerAppProductoinRevision parameter. If the previous Get-ContainerAppProductionRevision.ps1 step returned a revision name with a production label, then we have to setup the ingress traffic rules such that production remains with 100% of the traffic, but the latest revision we’re deploying is set to 0%. In other words, don’t mess with the current Production slot. Otherwise, if there was no previous production slot defined, then there’s no traffic rules to define (yet). This is the crux of getting this slot-like behavor to work.

Step 3: Apply the “staging” Label to the Latest Revision

Next we run the Set-ContainerAppStagingLabel.ps1 script to apply the staging label to the latest revision.

# https://github.com/nshenoy/azure-containerapp-slots-example/blob/main/deployment/scripts/Set-ContainerAppStagingLabel.ps1

[CmdletBinding()]
param(
    [Parameter(Mandatory=$true)]
    [string] $resourceGroupName,

    [Parameter(Mandatory=$true)]
    [string] $containerAppName
)

&az config set extension.use_dynamic_install=yes_without_prompt

# fetch latest revision
Write-Host "Finding latest revision..."
$latestRevision = (&az containerapp revision list -g $resourceGroupName -n $containerAppName --query "reverse(sort_by([].{name:name, date:properties.createdTime},&date))[0].name" -o tsv)

Write-Host "Latest revision: $latestRevision"

# Find revision with label of "staging" and remove revision.
Write-Host "Finding staging revision..."
$stagingRevision = (&az containerapp ingress show -g $resourceGroupName -n $containerAppName --query 'traffic[?label == `staging`].revisionName' -o tsv)

Write-Host "Finding production revision..."
$productionRevision = (&az containerapp ingress show -g $resourceGroupName -n $containerAppName --query 'traffic[?label == `production`].revisionName' -o tsv)


if([System.String]::IsNullOrEmpty($stagingRevision)) {
    Write-Host "No staging revision found."
} else {
    Write-Host "Staging revision: $stagingRevision"
    # Write-Host "Removing staging revision: $stagingRevision"
    # &az containerapp revision deactivate -g $resourceGroupName -n $containerAppName --revision $stagingRevision
    Write-Host "Removing staging label from revision: $stagingRevision"
    &az containerapp revision label remove -g $resourceGroupName -n $containerAppName --label staging
}

# Apply "staging" label to latest revision.
Write-Host "Applying staging label to latest revision..."
&az containerapp revision label add -g $resourceGroupName -n $containerAppName --label staging --revision "$latestRevision" --no-prompt --yes

# Write-Host "Setting traffic weights..."
if([System.String]::IsNullOrEmpty($productionRevision)) {
    &az containerapp ingress traffic set -g $resourceGroupName -n $containerAppName --revision-weight latest=100 --label-weight staging=0
} else {
    &az containerapp ingress traffic set -g $resourceGroupName -n $containerAppName --label-weight production=100 staging=0
}

At this point, the latest container image revision is staged. We can then test to make sure it behaves as needed. The revision FQDN can be retrieved from the Azure portal by going to your Container App -> Revision management and then clicking on your staging labeled revision.

The “Label URL” will always be the Container App name with ---staging appended to the end.

Step 4: Swap “staging” and “production”

Finally the production job will run the Swap-ContainerAppRevisions.ps1 to swap revision labels and verify that the production label has 100% of the traffic.

# https://github.com/nshenoy/azure-containerapp-slots-example/blob/main/deployment/scripts/Swap-ContainerAppRevisions.ps1

[CmdletBinding()]
param(
    [Parameter(Mandatory=$true)]
    [string] $resourceGroupName,

    [Parameter(Mandatory=$true)]
    [string] $containerAppName
)

&az config set extension.use_dynamic_install=yes_without_prompt

Write-Host "Finding staging revision..."
$stagingRevision = (&az containerapp ingress show -g $resourceGroupName -n $containerAppName --query 'traffic[?label == `staging`].revisionName' -o tsv)

Write-Host "Staging revision: $stagingRevision"

Write-host "Finding production revision..."
$productionRevision = (&az containerapp ingress show -g $resourceGroupName -n $containerAppName --query 'traffic[?label == `production`].revisionName' -o tsv)

if([System.String]::IsNullOrEmpty($productionRevision)) {
    Write-Host "No production revision found."
    Write-Host "Applying production label to staging revision..."
    &az containerapp revision label add -g $resourceGroupName -n $containerAppName --label production --revision $stagingRevision
} else {
    Write-Host "Production revision: $productionRevision"
    Write-Host "Swapping staging and production revisions..."
	&az containerapp revision label swap -g $resourceGroupName -n $containerAppName --source staging --target production
}

# set traffic for production=100 and staging=0
Write-Host "Setting traffic for production=100 and staging=0..."
if([System.String]::IsNullOrEmpty($productionRevision)) {
    &az containerapp ingress traffic set -g $resourceGroupName -n $containerAppName --label-weight production=100
} else {
    &az containerapp ingress traffic set -g $resourceGroupName -n $containerAppName --label-weight production=100 staging=0
}

Write-Host "Swap complete!"

What’s Next

The big thing still missing is the cleanup of old revisions. At some point in the scripts above (perhaps the final step?) we need to deactivate any revisions that aren’t labelled. Also, it kind of sucks to have these scripts live in the repo. Seems like these should be implemented as a set of build tasks that can be easily included into the workflow.

This entry was posted in Projects, Work and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s