Skip to content

Troubleshooting

If something goes wrong, start here. Each entry has the symptom you’ll see, the cause, and the fix. If your problem isn’t here, please open an issue or start a discussion.

Error: building account: getting authenticated object ID: parsing JSON result

Section titled “Error: building account: getting authenticated object ID: parsing JSON result”

Cause: Azure CLI not logged in, or your token expired.

Fix:

Terminal window
az logout
az login
az account set --subscription <your-sub-id>
az account show

AuthorizationFailed on terraform apply / az deployment

Section titled “AuthorizationFailed on terraform apply / az deployment”

Cause: The identity running the deployment doesn’t have Contributor on the target subscription (or Management Group Contributor at tenant root for the MG module).

Fix:

Terminal window
# As subscription owner:
az role assignment create \
--assignee <upn-or-object-id> \
--role Contributor \
--scope /subscriptions/<sub-id>

For CI: confirm the federated credential’s subject (repo:<owner>/<repo>:ref:refs/heads/main or :pull_request) matches your workflow trigger.

KeyVaultAccessForbidden after deploy succeeds

Section titled “KeyVaultAccessForbidden after deploy succeeds”

Cause: RBAC took a few minutes to propagate, or your user has no Key Vault role.

Fix: Wait 5 minutes, then verify:

Terminal window
az keyvault secret list --vault-name <kv-name>
# If still forbidden:
az role assignment create \
--assignee <your-upn> \
--role "Key Vault Secrets Officer" \
--scope $(az keyvault show -n <kv-name> --query id -o tsv)

Backend configuration changed on terraform init

Section titled “Backend configuration changed on terraform init”

Cause: You switched scenarios (which uses a different state file key) without explicitly re-initing.

Fix:

Terminal window
terraform init -reconfigure \
-backend-config="resource_group_name=$TFSTATE_RG" \
-backend-config="storage_account_name=$TFSTATE_SA" \
-backend-config="container_name=tfstate" \
-backend-config="key=foundation.<scenario>.tfstate"

Use select -or-create:

Terminal window
terraform workspace select -or-create <scenario>

Plan shows tag drift on every resource after upgrading

Section titled “Plan shows tag drift on every resource after upgrading”

Cause: The default workload tag value changed between releases.

Fix: Either accept the diff (cheap, metadata-only) or pin the old value in your tfvars:

tags = {
workload = "smb-foundations" # or whatever you had
iac = "terraform"
cost_center = "platform"
}

cidrhost: argument must be a CIDR on the on-prem CIDR var

Section titled “cidrhost: argument must be a CIDR on the on-prem CIDR var”

Cause: You passed a host IP (e.g. 192.168.1.5) instead of a CIDR.

Fix: Use CIDR notation: 192.168.0.0/16.

Cause: Provider can’t find a subscription. Either ARM_SUBSCRIPTION_ID env var is missing or the subscription_id variable is empty.

Fix:

Terminal window
export ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
# or pass via -var
terraform plan -var "subscription_id=$ARM_SUBSCRIPTION_ID" ...

Cause: Bicep / azure-cli version is older than the API version the module uses.

Fix: Upgrade Bicep:

Terminal window
az bicep upgrade
az bicep version # should be >= 0.30
az version

InvalidTemplateDeployment: The template deployment is not valid

Section titled “InvalidTemplateDeployment: The template deployment is not valid”

Run what-if first — it tells you which resource fails and why:

Terminal window
az deployment sub what-if \
--location <region> \
--name foundation-debug \
--parameters infra/bicep/foundation/scenarios/<scenario>.bicepparam

Bicep parameter file fails build with Could not find type of resource

Section titled “Bicep parameter file fails build with Could not find type of resource”

Cause: AVM-style modules are referenced via br/public: and the local Bicep registry cache is stale.

Fix:

Terminal window
rm -rf ~/.bicep/cache
az bicep build --file infra/bicep/foundation/main.bicep

Subnet address overlap with on-prem after enabling VPN

Section titled “Subnet address overlap with on-prem after enabling VPN”

Cause: Hub or spoke CIDR overlaps with one of your on-prem CIDRs.

Fix: Pick non-overlapping ranges. Defaults are 10.0.0.0/23 (hub) and 10.0.2.0/23 (spoke); change the VNet ranges in your tfvars or bicepparam, and use non-overlapping on-prem ranges when you add the post-deploy S2S connection:

address_space_hub = "10.20.0.0/23"
address_space_spoke = "10.20.2.0/23"

Spoke VMs can’t reach the internet after switching to firewall scenario

Section titled “Spoke VMs can’t reach the internet after switching to firewall scenario”

Cause: Expected — the firewall scenario replaces NAT. The default route 0.0.0.0/0 → firewall private IP is in place but the firewall has no allow rules by default.

Fix: Add a network rule collection, e.g. via portal or:

Terminal window
az network firewall network-rule create \
--collection-name allow-egress \
--firewall-name afw-<prefix>-<region> \
--resource-group rg-hub-<prefix>-<region> \
--name allow-https \
--action Allow --priority 100 \
--protocols TCP --source-addresses '*' \
--destination-addresses '*' --destination-ports 443

Check, in order:

  1. BGP / static routesaz network vnet-gateway list-bgp-peer-status shows the on-prem ASN.
  2. NSG on the spoke subnet doesn’t block on-prem source ranges.
  3. Route table on the spoke subnet has a UDR for the on-prem CIDR pointing to VirtualNetworkGateway (gateway transit), not the firewall (in vpn only) or to the firewall (in full).
  4. On-prem firewall allows the Azure-side IPsec endpoint IP and the negotiated subnets.

VpnGatewaySkuNotSupported in non-AZ regions

Section titled “VpnGatewaySkuNotSupported in non-AZ regions”

Cause: As of mid-2025 Azure requires AZ-class SKUs (VpnGw1AZ, VpnGw2AZ) even in regions without AZs. The repo defaults to VpnGw2AZ.

Fix: No action needed if you’re using the bundled scenarios. If you customised the SKU back to VpnGw1, switch it back to VpnGw2AZ.

AuthorizationFailed deploying the MG module

Section titled “AuthorizationFailed deploying the MG module”

Cause: Subscription Contributor isn’t enough — MG operations require Management Group Contributor at the tenant-root MG.

Fix: As the global admin (or someone with User Access Administrator at tenant root):

Terminal window
az role assignment create \
--assignee <upn> \
--role "Management Group Contributor" \
--scope /providers/Microsoft.Management/managementGroups/<tenant-root-id>

MG hierarchy created but policies aren’t taking effect

Section titled “MG hierarchy created but policies aren’t taking effect”

Cause: Policy assignments are inherited from the MG hierarchy but compliance evaluation runs every 24h. New resources are evaluated within ~30 min.

Fix: Trigger an on-demand scan:

Terminal window
az policy state trigger-scan \
--resource-group rg-spoke-prod-<prefix>-<region>

Error: Identity not allowed to access subscription

Section titled “Error: Identity not allowed to access subscription”

Cause: OIDC federated credential subject doesn’t match the workflow’s ref or environment.

Fix: In Entra → App registrations → your app → Federated credentials, verify the subject string. Common patterns:

  • repo:travishankins/azure-launchpad:ref:refs/heads/main
  • repo:travishankins/azure-launchpad:pull_request
  • repo:travishankins/azure-launchpad:environment:prod

You typically need one credential per trigger you use.

terraform-plan fails with Error acquiring the state lock

Section titled “terraform-plan fails with Error acquiring the state lock”

Cause: A previous run died without releasing the lock, or two runs are in progress.

Fix:

Terminal window
# In the TF backend storage account, delete the lease on the .tfstate blob.
az storage blob lease break \
--account-name <sa> \
--container-name tfstate \
--blob-name foundation.<scenario>.tfstate \
--auth-mode login

Pages site shows old content after pushing

Section titled “Pages site shows old content after pushing”

Cause: site-deploy.yml only triggers on changes to site/**. If you only changed root-level files, Pages won’t rebuild.

Fix: Re-run the workflow manually:

Terminal window
gh workflow run site-deploy.yml --repo travishankins/azure-launchpad

Cause: Older builds of this site used a base path of /azure-launchpad/ to match the GitHub Pages project URL. The site now serves from / at the custom domain (azurelaunchpad.com), so links should resolve at the root.

Fix: Open http://localhost:4321/. If you’re on an older branch / fork that still sets base: '/azure-launchpad/' in astro.config.mjs, use http://localhost:4321/azure-launchpad/ instead, or update the config.

Cause: Browser blocks navigator.clipboard over plain HTTP.

Fix: The button falls back to “Press Ctrl-C” — the textarea is already selected, so just hit Ctrl/Cmd-C. In production (HTTPS) the clipboard API works normally.

The README costs are resource-only, in westcentralus, with no traffic, no log ingestion above the 30-day Free tier, and no backup data. Real-world overages usually come from:

  • Log Analytics ingestion — > 5 GB/month is billed; trim diagnostic settings if needed.
  • Egress data transfer — anything leaving the region incurs bandwidth.
  • Public IPs — Standard SKU public IPs cost ~$3.65/mo each even when idle. The firewall and VPN scenarios add 2.
  • Backup retention — RSV deploys with no policy by default; if you’ve added a policy, retained snapshots are billable.

Set up cost alerts, or enable the built-in budgets module (budget_enabled = true in Terraform, budgetEnabled: true in Bicep) for monthly spend caps with email notifications at 50/80/100%.

  • Check GitHub Discussions — someone may have hit it already.
  • Open a bug issue with the reproduction template.
  • For security issues, follow SECURITY.md.