Rescuing an AKS Namespace Stuck in Terminating State

Rescuing an AKS Namespace Stuck in Terminating State

A real-world story: how a nightly cleanup pipeline got blocked by a namespace stuck in Terminating, and the kubectl proxy + finalizer trick that finally got it unstuck.

If you’ve worked with Kubernetes long enough, you’ve probably hit it: you delete a namespace, and instead of going away, it just sits there in Terminating — for minutes, hours, sometimes days. This is the story of one of those mornings, and the fix that became muscle memory for our team.

A Bit of Context

I work on an enabler team. Our job is to provision AKS clusters and the surrounding Azure infrastructure for the development teams across dev, test, acceptance, and production. To make sure those environments stay reliable, we keep a few clusters of our own where we run daily and nightly builds that exercise the whole provisioning pipeline end-to-end.

Part of that pipeline is a cleanup job. Before each run, it tears down the leftovers from the previous build:

  • Azure SQL / Postgres databases
  • Azure Event Hubs
  • And on the AKS side, all the workloads running in their dedicated namespaces

The way the AKS cleanup works is simple — it just deletes the namespace and lets Kubernetes garbage-collect everything inside it. Most days that’s fine. But every now and then, a namespace would go rogue and refuse to leave, and the pipeline would fail right at that step.

The Morning It Broke

One morning a colleague pinged me: “The nightly pipeline failed again.”

Looking at the logs, the last thing the pipeline tried to do was delete a namespace. After that — silence, then a timeout. So I jumped onto the cluster:

kubectl get ns

Sure enough, there it was:

NAME              STATUS        AGE
my-test-ns        Terminating   3h

Three hours in Terminating. Not great.

Why Namespaces Get Stuck

When you delete a namespace, Kubernetes doesn’t just nuke it. The namespace object has a spec.finalizers list, and as long as anything is still in that list, the API server will not remove the object. Finalizers are essentially a “hold on, I still have cleanup to do” signal from a controller.

The most common culprit is the built-in kubernetes finalizer, which is driven by the namespace controller. If a controller responsible for cleaning up some resource type is unhealthy, missing, or has lost its CRD, the namespace can hang indefinitely waiting for it.

Let’s confirm that’s what’s happening:

kubectl get namespace my-test-ns -o json | jq '.spec'

You’ll typically see something like:

{
  "finalizers": [
    "kubernetes"
  ]
}

And the status block will tell you why it can’t finish:

kubectl get namespace my-test-ns -o json | jq '.status'
{
  "phase": "Terminating",
  "conditions": [
    {
      "type": "NamespaceFinalizersRemaining",
      "status": "True",
      "reason": "SomeFinalizersRemain",
      "message": "Some content in the namespace has finalizers remaining: kubernetes in 1 resource instances"
    }
  ]
}

That’s the smoking gun.

Why kubectl edit Won’t Save You

The instinct is to just edit the namespace and remove the finalizer:

kubectl edit namespace my-test-ns

It looks like it works — the editor closes, no error. But if you re-check the namespace, the finalizer is still there. Why?

Because once a namespace is in Terminating, the API server rejects updates to spec.finalizers through the normal PUT /namespaces/{name} endpoint. Removing finalizers is a privileged operation that has its own dedicated subresource: /finalize. kubectl edit doesn’t hit that subresource, so your change is silently ignored.

That’s the trap. You need to PUT to the /finalize subresource directly.

The Fix: kubectl proxy + curl to /finalize

Here’s the recipe that’s saved us many times.

1. Dump the namespace JSON

kubectl get namespace my-test-ns -o json > ns.json

2. Remove the finalizers

Open ns.json and set spec.finalizers to an empty array:

"spec": {
  "finalizers": []
}

Or do it in one shot with jq:

kubectl get namespace my-test-ns -o json \
  | jq '.spec.finalizers = []' > ns.json

3. Start kubectl proxy

In a separate terminal:

kubectl proxy

You should see:

Starting to serve on 127.0.0.1:8001

This gives you an authenticated local endpoint to the API server, so you don’t have to fiddle with bearer tokens by hand.

4. PUT the JSON to the /finalize subresource

curl -H "Content-Type: application/json" \
  -X PUT \
  --data-binary @ns.json \
  http://127.0.0.1:8001/api/v1/namespaces/my-test-ns/finalize

The moment that request returns 200 OK, the namespace disappears:

kubectl get ns my-test-ns
# Error from server (NotFound): namespaces "my-test-ns" not found

A Quick Word of Caution

This trick works because you’re telling the API server “trust me, the cleanup is done.” That’s fine when:

  • You know the controller behind the finalizer is gone or broken.
  • The namespace is on a non-production / disposable cluster.
  • You’ve already accepted that any leftover resources may become orphaned.

It’s not a fix for the underlying problem — if a controller is repeatedly leaving namespaces stuck, you should also figure out why (a missing CRD, a crashing operator, an APIService pointing at a dead webhook are common causes). Running kubectl api-resources and kubectl get apiservice is a good first stop.

The Right Way to Clean Up a Namespace

The /finalize trick is a rescue tool — not a cleanup strategy. The real lesson from this incident was that deleting the namespace first is the wrong order of operations. Namespaces should be the last thing to go, after the workloads inside them have been removed gracefully so their controllers can run their finalizers properly.

A safer order to follow:

1. Uninstall Helm releases, then optionally clean up CRDs

If the workloads were installed with Helm, let Helm tear them down first. It knows about hooks and ordering in a way that a blunt kubectl delete ns doesn’t. If the chart also installed CRDs and you truly need them removed, treat that as a separate follow-up step after the release is gone.

# List every release in the namespace
helm list -n my-test-ns -q | while read release; do
  helm uninstall "$release" -n my-test-ns --wait
done

A few things to be careful about:

  • CRDs are not removed by helm uninstall by default (Helm’s design). If a chart installed CRDs and you really want them gone, delete them explicitly after the release is uninstalled:

    # Pseudocode: first identify the CRD names that belong only to the chart you just removed,
    # then delete those CRDs explicitly.
    kubectl get crd ...
    # ...filter down to the CRD names owned only by that chart...
    kubectl delete crd <crd-name-1> <crd-name-2>

    Only do this if no other namespace depends on those CRDs.

  • Use --wait so Helm blocks until resources are actually gone — this is what gives controllers time to run their finalizers.

2. Delete remaining workloads explicitly

For anything not managed by Helm (raw manifests, kustomize, hand-applied YAML), delete the workload kinds directly so their controllers do the right thing:

kubectl delete deploy,sts,ds,job,cronjob,rs --all -n my-test-ns --wait=true
kubectl delete svc,ingress,configmap,secret --all -n my-test-ns
kubectl delete pvc --all -n my-test-ns

Deleting Deployment / StatefulSet / DaemonSet first lets the controllers gracefully scale down pods and detach volumes — instead of being killed mid-flight when the namespace goes away.

3. Then delete the namespace

Now the namespace should be empty (or close to it), and kubectl delete ns will finish in seconds:

kubectl delete namespace my-test-ns --wait=true

4. Keep the /finalize workaround as a safety net

Even with the steps above, things can still go wrong — a flaky operator, a stale APIService, a webhook pointing nowhere. Keep the /finalize recovery path in your back pocket, but treat it as the fallback, not the primary cleanup mechanism.

TL;DR

  • A namespace stuck in Terminating almost always means a finalizer hasn’t been removed.
  • kubectl edit won’t fix it — updates to spec.finalizers on a terminating namespace are ignored on the normal endpoint.
  • Use kubectl proxy + a PUT to /api/v1/namespaces/<name>/finalize with the finalizers cleared.
  • The real fix is don’t delete the namespace first: helm uninstall your releases (and clean up CRDs if needed), delete deployments / statefulsets / daemonsets, then delete the namespace.
  • Keep the /finalize trick as a break-glass tool, not a daily habit.

This one-liner has earned its place in our team runbook — hopefully it earns a place in yours too.

Back to all posts