A Practical Guide to Kubernetes Security: Hardening Your Cluster in 2025
BEST-PRACTICESSeptember 7, 2025

A Practical Guide to Kubernetes Security: Hardening Your Cluster in 2025

This practical guide walks you through essential hardening techniques, best practices, and actionable steps to secure Kubernetes clusters in 2025. Learn how to defend nodes, control planes, and workloads from common threats.

Sealos
Sealos
Share at:

Kubernetes won the container orchestration wars by making distributed systems feel manageable. But with great power comes a large attack surface. Misconfigurations, over‑privileged workloads, weak boundaries, and insecure defaults can turn your cluster into a liability. The good news: most risks are predictable and defendable with disciplined practices and a few modern tools.

This practical guide walks you through what Kubernetes security is, why it matters, how the security model works, and how to harden your clusters in 2025 with hands‑on examples you can apply today.


What Is Kubernetes Security?

Kubernetes security is the practice of protecting a Kubernetes environment across its lifecycle: from the supply chain that builds container images, to the control plane and data plane that run workloads, to the runtime behavior of applications and the boundaries between them.

Key layers you must secure:

  • Supply chain: source code, dependencies, container images, signatures, SBOMs.
  • Control plane: API server, etcd, controller manager, scheduler.
  • Data plane: nodes, kubelet, CNI, container runtime.
  • Workloads: pods, deployments, sidecars, secrets.
  • Access: authentication (AuthN), authorization (AuthZ), admission control.
  • Network: policies, ingress/egress, service mesh, TLS/mTLS.
  • Observability: audit logs, runtime detection, forensics, backups.

Kubernetes is not “secure by default.” Its power is its flexibility; your job is to install guardrails and enforce least privilege.


Why It’s Important in 2025

  • Rising complexity: multi‑cluster, multi‑cloud, and edge deployments expand the attack surface.
  • Supply chain threats: dependency hijacking, typosquatting, and poisoned images remain common.
  • Identity shifts left: service identities (not users) drive access; misbinding identities can expose data.
  • Regulatory pressure: SOC 2, ISO 27001, PCI, HIPAA require controls and evidence.
  • Evolving Kubernetes: features like Pod Security Admission and policy engines have matured; attackers know them too.

Security is now a continuous program, not a one‑time setup.


How Kubernetes Security Works (At a Glance)

  • Authentication: Users, service accounts, and nodes authenticate via certificates, tokens, OIDC.
  • Authorization: RBAC (role‑based access control) determines what identities can do.
  • Admission control: Policies validate or mutate resources before they persist (e.g., Pod Security Admission, ValidatingAdmissionPolicy, Kyverno).
  • Runtime isolation: Linux primitives (namespaces, cgroups, seccomp, AppArmor/SELinux) and container runtimes isolate processes.
  • Network segmentation: CNI plugins and NetworkPolicies control pod‑to‑pod and pod‑to‑external traffic.
  • Data protection: etcd encryption at rest; TLS in transit; secret management integrations.
  • Observability and audit: API audit logs, runtime detection (e.g., Falco), and metrics create feedback loops.

With that mental model, let’s harden a real cluster.


Foundations: Threat Model and Principles

Before turning knobs, define a threat model:

  • Who are the actors? Developers, CI/CD systems, cluster admins, tenants, attackers.
  • What are assets? API server, credentials, secrets, workloads, data stores.
  • What are risks? Privilege escalation, lateral movement, data exfiltration, persistence.

Guiding principles:

  • Least privilege everywhere (humans and workloads).
  • Secure by default (deny by default, explicit allow).
  • Verified supply chain (signed, scanned, attested).
  • Defense in depth (multiple, independent controls).
  • Immutable infrastructure (reconcile drift, audit changes).
  • Evidence‑based (logs and metrics you can prove).

Hardening the Control Plane

The control plane is the brain. Protect it like production databases: limited access, strong auth, encryption, and monitoring.

API Server Basics

  • Require TLS v1.2+ for all connections.
  • Disable anonymous auth.
  • Enforce RBAC.
  • Turn on audit logging.
  • Encrypt secrets at rest in etcd.

Example kube‑apiserver flags (managed control planes expose these as settings rather than flags):

kube-apiserver \
  --authorization-mode=Node,RBAC \
  --anonymous-auth=false \
  --audit-log-path=/var/log/kubernetes/audit.log \
  --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
  --encryption-provider-config=/etc/kubernetes/encryption-config.yaml \
  --tls-min-version=VersionTLS12

Audit policy example (log sensitive writes, decline noisy reads):

# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: None
    resources:
      - group: ""
        resources: ["configmaps"]
    namespaces: ["kube-system"] # reduce noise; tune per org
  - level: Metadata
    verbs: ["get", "list", "watch"]
    resources:
      - group: ""
        resources: ["pods", "services", "endpoints"]
  - level: RequestResponse
    verbs: ["create", "update", "patch", "delete"]
    resources:
      - group: ""
        resources: ["secrets", "configmaps"]
      - group: "rbac.authorization.k8s.io"
        resources:
          ["rolebindings", "clusterrolebindings", "roles", "clusterroles"]

Secrets encryption at rest (also supported in managed services):

# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
    providers:
      - kms:
          name: kmsplugin
          endpoint: unix:///var/run/kmsplugin/socket.sock
          cachesize: 1000
          timeout: 3s
      - aescbc:
          keys:
            - name: key1
              secret: <base64-encoded-32-byte-key>
      - identity: {}

Enable with the --encryption-provider-config flag and rotate keys regularly.

etcd Security

  • Use TLS client and peer certs; restrict access to control plane only.
  • Enable disk encryption on nodes hosting etcd.
  • Regular backups with encryption and offsite storage.
  • Network segmentation: etcd should not be Internet‑reachable.

Hardening Worker Nodes and Kubelet

Attackers often jump from a container to the node. Reduce blast radius.

Kubelet Configuration

  • Disable unauthenticated access; disable read‑only port (10255).
  • Use webhook authz/authn.
  • Enforce TLS v1.2+.
  • Enable seccomp default profile when supported by your version.
# /var/lib/kubelet/config.yaml
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
authorization:
  mode: Webhook
readOnlyPort: 0
tlsMinVersion: VersionTLS12
protectKernelDefaults: true
eventRecordQPS: 5
serverTLSBootstrap: true
featureGates:
  SeccompDefault: true

Start kubelet with:

kubelet --config=/var/lib/kubelet/config.yaml --rotate-certificates=true

Host OS and Runtime

  • Use minimal, auto‑updating OS (e.g., Bottlerocket, Flatcar, COS).
  • Keep kernel and container runtime (containerd, CRI‑O) patched.
  • Limit SSH (ideally disable, manage via SSM/OSLogin); enforce MFA for break‑glass.
  • SELinux/AppArmor enforcing; disable unnecessary kernel modules.
  • Consider sandboxed runtimes for untrusted workloads (gVisor, Kata Containers) via RuntimeClass.

RuntimeClass example:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc

Use in a Pod:

apiVersion: v1
kind: Pod
metadata:
  name: sandboxed
spec:
  runtimeClassName: gvisor
  containers:
    - name: app
      image: ghcr.io/acme/app:1.2.3

Authentication and Authorization (RBAC) Done Right

RBAC mistakes are common. Start with deny by default.

  • Disable legacy ABAC if present.
  • Prefer Roles/RoleBindings scoped to namespaces; reserve ClusterRoles for cluster‑wide resources.
  • Bind service accounts explicitly; avoid default service account access in namespaces.

Example least‑privilege Role and RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: reader
  namespace: payments
rules:
  - apiGroups: [""]
    resources: ["pods", "services", "configmaps"]
    verbs: ["get", "list", "watch"]
 
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payments-bot
  namespace: payments
 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: payments-bot-reader
  namespace: payments
subjects:
  - kind: ServiceAccount
    name: payments-bot
    namespace: payments
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: reader

Best practices:

  • Turn off auto‑mounting tokens when not needed: set automountServiceAccountToken: false on Pods or ServiceAccounts.
  • Use short‑lived, projected service account tokens (TokenRequest API) and cloud‑native workload identity for cloud APIs (e.g., EKS IRSA, GKE Workload Identity, AKS Managed Identities).

Admission Control and Policy

Admission controllers enforce security before resources are persisted.

Pod Security Admission (PSA)

Pod Security Policy is removed; use Pod Security Admission via namespace labels.

  • Levels: privileged, baseline, restricted.
  • Enforce restricted wherever possible.

Apply restricted to a namespace:

kubectl label ns prod \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

ValidatingAdmissionPolicy (CEL)

Recent Kubernetes releases support policy enforcement with CEL expressions without webhooks. Example: require allowPrivilegeEscalation: false.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: disallow-priv-esc
spec:
  matchConstraints:
    resourceRules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE", "UPDATE"]
        resources: ["pods"]
  validations:
    - expression: "object.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.allowPrivilegeEscalation) && c.securityContext.allowPrivilegeEscalation == false)"
      message: "allowPrivilegeEscalation must be false for all containers"

Bind it:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: disallow-priv-esc-binding
spec:
  policyName: disallow-priv-esc
  validationActions: ["Deny"]

Policy Engines (Kyverno/Gatekeeper)

For richer policies and supply chain verification, use:

  • Kyverno: Kubernetes‑native, policy‑as‑YAML; supports image signature verification.
  • OPA Gatekeeper: Rego‑based, very flexible.

Kyverno example to verify container image signatures (Sigstore Cosign):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-signed-images
spec:
  rules:
    - name: verify-ghcr
      match:
        any:
          - resources:
              kinds: ["Pod"]
      verifyImages:
        - imageReferences:
            - "ghcr.io/acme/*"
          attestors:
            - entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      MIIBIjANBgkq...
                      -----END PUBLIC KEY-----
          failureAction: Enforce

Namespace and Multi‑Tenancy Isolation

Namespaces are your first tenancy boundary.

  • Assign each team/app its own namespace.
  • Apply PSA labels per namespace.
  • Use ResourceQuotas and LimitRanges to prevent noisy neighbor issues.
  • Use NetworkPolicies to isolate traffic across namespaces.
  • Optionally, pin tenants to dedicated nodes with taints/tolerations and nodeSelectors.

Example resource limits:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container

Workload Hardening: Pod and Container Security

Most incidents start with a vulnerable or misconfigured pod. Lock them down.

Checklist for Pod spec:

  • Run as non‑root.
  • Drop Linux capabilities; add only what you need.
  • Read‑only root filesystem.
  • No privilege escalation.
  • Seccomp profile set to RuntimeDefault or custom.
  • AppArmor/SELinux profiles enforced.
  • Limit resources (CPU/memory) to curb DoS.
  • Avoid hostPath volumes; if needed, read‑only and narrow paths.
  • Disable hostNetwork, hostPID, hostIPC unless absolutely necessary.

Secure Pod example:

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
  labels:
    app: secure-app
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
    runAsNonRoot: true
    runAsUser: 10001
    fsGroup: 20001
  containers:
    - name: app
      image: ghcr.io/acme/secure-app:2.0.0
      resources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["ALL"]
          add: [] # add only if required
      volumeMounts:
        - name: tmp
          mountPath: /tmp
  volumes:
    - name: tmp
      emptyDir:
        medium: Memory

Prefer distroless or minimal images, pin versions with digests, and avoid shell and package managers in production images.


Network Segmentation: Deny by Default

Without NetworkPolicies, most CNIs allow all pod‑to‑pod traffic. That’s lateral‑movement heaven.

  1. Default deny egress/ingress in each namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: prod
spec:
  podSelector: {}
  policyTypes: ["Ingress", "Egress"]
  1. Allow only what’s needed. For example, web pods can talk to db pods on 5432:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-to-db
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: web
  policyTypes: ["Egress"]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: prod
          podSelector:
            matchLabels:
              app: db
      ports:
        - protocol: TCP
          port: 5432

Advanced:

  • Use CNIs like Cilium or Calico for robust policy features; Cilium can enforce L7 policies and visibility.
  • Egress control with FQDN policies and DNS restrictions to prevent data exfiltration.

Ingress, TLS, and Service Mesh mTLS

  • Terminate TLS for all external endpoints; use cert‑manager to automate certificates.

cert‑manager example:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: sre@acme.com
    privateKeySecretRef:
      name: letsencrypt-key
    solvers:
      - http01:
          ingress:
            class: nginx

Ingress with TLS:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  tls:
    - hosts: ["app.acme.com"]
      secretName: web-tls
  rules:
    - host: app.acme.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80
  • Use a service mesh (e.g., Istio, Linkerd) to enable mTLS by default and apply zero‑trust policies.

Istio mTLS enforcement:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: prod
spec:
  mtls:
    mode: STRICT

Secrets Management: Do It Right

Kubernetes Secrets are base64‑encoded by default. Treat them as sensitive.

  • Enable encryption at rest (control plane section).
  • Restrict RBAC to secrets; avoid listing secrets broadly.
  • Avoid putting secrets in env vars when possible; prefer volumes to reduce accidental logs exposure.
  • Rotate secrets and tokens regularly.

External secret stores:

  • Use CSI Secrets Store with providers for AWS/GCP/Azure/Vault.
  • Map cloud IAM roles to service accounts for least privilege.

Example: CSI Secret Store + external Vault secret:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: vault-db-creds
  namespace: prod
spec:
  provider: vault
  parameters:
    roleName: db-app
    objects: |
      - objectName: db-password
        secretPath: kv/data/prod/db
        secretKey: password
---
apiVersion: v1
kind: Pod
metadata:
  name: db-client
  namespace: prod
spec:
  serviceAccountName: prod-app
  volumes:
    - name: secrets
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "vault-db-creds"
  containers:
    - name: app
      image: ghcr.io/acme/db-client:1.0.0
      volumeMounts:
        - name: secrets
          mountPath: "/mnt/secrets"
          readOnly: true

Ensure token projection and workload identity mapping are correctly configured.


Supply Chain Security: From Code to Cluster

Prevent compromised images from ever reaching the cluster.

  • Build provenance: generate SBOMs (e.g., Syft), sign artifacts (Sigstore Cosign), record attestations (SLSA‑aligned).
  • Scan images for vulnerabilities (Trivy, Grype) in CI and registries; fail builds on critical CVEs with available fixes.
  • Pin images by digest in manifests; avoid floating tags (latest).
  • Enforce signature verification at admission (Kyverno or ValidatingAdmissionPolicy + image verification tools).
  • Restrict registries: only allow images from approved registries.

Cosign signing:

cosign generate-key-pair  # stores keys locally or use keyless OIDC with Fulcio
cosign sign ghcr.io/acme/secure-app@sha256:...
cosign verify ghcr.io/acme/secure-app@sha256:...

Kyverno deny non‑approved registries:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-registries
spec:
  validationFailureAction: Enforce
  rules:
    - name: only-ghcr
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "Only ghcr.io/acme images are allowed."
        pattern:
          spec:
            containers:
              - image: "ghcr.io/acme/*"

Runtime Security and Observability

Detect and respond to suspicious behavior.

  • Enable API audit logs and ship to a SIEM.
  • Collect container and node metrics/logs (Prometheus, Loki, CloudWatch/Stackdriver).
  • Runtime detection tools (Falco, Cilium Tetragon) to alert on abnormal syscalls (e.g., opening /etc/shadow).
  • Set resource limits to mitigate noisy neighbor / fork bombs.
  • Prepare for forensics: centralized logs, immutable buckets, and node image snapshots (where possible).

Falco example rule snippet:

- rule: Write below etc
  desc: Detect any write below /etc
  condition: >
    evt.type in (open,openat,creat,truncate,ftruncate) and
    fd.name startswith /etc and
    evt.arg.flags contains O_WRONLY
  output: "File below /etc opened for writing (user=%user.name command=%proc.cmdline file=%fd.name)"
  priority: WARNING

Practical Hardening: A Step‑By‑Step Plan

0–30 Days: Baseline Controls

  • Enforce RBAC; remove wildcard permissions.
  • Disable anonymous auth on API server and kubelet; enable audit logs.
  • Turn on encryption at rest for secrets.
  • Label namespaces with Pod Security Admission: enforce restricted where possible.
  • Apply default‑deny NetworkPolicies in all namespaces.
  • Harden workloads: runAsNonRoot, drop capabilities, readOnlyRootFilesystem.
  • Rotate and pin image tags by digest; scan images in CI.
  • Enable TLS for all ingress; deploy cert‑manager.
  • Centralize logs and alerts; basic Falco/Tetragon ruleset.

31–60 Days: Policy and Identity

  • Adopt Kyverno or Gatekeeper; codify key policies (no privileged pods, registry allowlist, resource limits, image signatures).
  • Introduce ValidatingAdmissionPolicy for simple CEL based constraints.
  • Integrate workload identity: IRSA (EKS), Workload Identity (GKE), Managed Identity (AKS).
  • Introduce a service mesh for mTLS between services (start with critical namespaces).
  • Set up CSI Secrets Store with cloud KMS or Vault.

61–90 Days: Advanced Isolation and Resilience

  • Segment tenants with namespaces, quotas, and dedicated nodes (taints/tolerations).
  • Adopt sandboxed runtime (gVisor/Kata) for untrusted or multi‑tenant workloads.
  • Implement egress control and DNS restrictions.
  • Back up etcd and critical resources; practice restore.
  • Build provenance (SBOMs, signatures, attestations) into CI/CD and admission.
  • Add periodic security tests: kube‑bench (CIS), kube‑hunter (network), KubeLinter (manifests).

Cloud‑Specific Tips

  • EKS: Use IRSA for IAM roles; restrict security group rules; enable control plane logging; use AWS KMS for secret encryption.
  • GKE: Use Workload Identity; private clusters; shielded nodes; Binary Authorization for image attestation.
  • AKS: Use Managed Identities; Azure Policy for Kubernetes (built on Gatekeeper); private clusters and Azure Key Vault provider.

Across clouds:

  • Prefer private control planes and nodes; use VPN/peering.
  • Control egress with NAT gateways and firewall rules.
  • Lock registry access to your VPC/VNet and enable content trust.

Useful Tools and References

  • Scanning: Trivy, Grype, Syft (SBOM).
  • Policy: Kyverno, OPA Gatekeeper, ValidatingAdmissionPolicy (CEL).
  • Runtime: Falco, Tetragon.
  • Benchmarks: kube‑bench (CIS), kube‑hunter.
  • Supply chain: Sigstore Cosign, Rekor, SLSA framework.
  • Hardening guides: CIS Kubernetes Benchmark; vendor hardening docs.

Common Pitfalls to Avoid

  • Relying on network perimeters only; ignoring pod‑level segmentation.
  • Granting cluster‑admin to CI/CD or developers “temporarily” and never revoking.
  • Using latest image tags; skipping digest pinning and signature verification.
  • Mounting hostPath volumes broadly or running with hostNetwork without need.
  • Treating secrets as non‑sensitive; leaving them in env vars or repos.
  • Forgetting the kubelet: leaving read‑only port enabled or anonymous auth on.
  • Assuming managed clusters are “secure enough” without your policies.

A Quick Security Checklist

  • API server: RBAC only, anonymous off, audit on, secrets encrypted at rest.
  • etcd: TLS, restricted network, regular encrypted backups.
  • Kubelet: anonymous off, webhook authz, read‑only port 0, TLS v1.2+, SeccompDefault.
  • Namespaces: PSA labels set; quotas and default limits.
  • Workloads: non‑root, no privilege escalation, read‑only FS, caps dropped, seccomp/AppArmor.
  • Network: default‑deny policies; explicit allow rules; egress/DNS controls.
  • Ingress: TLS everywhere; cert‑manager automation; HSTS at edge.
  • Identity: short‑lived tokens; workload identity for cloud APIs; no default SA tokens.
  • Supply chain: scan, sign, attest; registry allowlist; digest pinning; admission verification.
  • Observability: audit logs to SIEM; runtime detection; alerting runbooks.
  • DR: etcd and config backups; tested restore procedure.

Conclusion

Kubernetes security in 2025 is about layering practical, automatable controls across the entire lifecycle. Start by locking down the control plane and kubelet, enforce least privilege with RBAC, and turn on Pod Security Admission. Segment the network with default‑deny policies and mTLS, and harden pods with strong security contexts. Shift left with signed, scanned images and admission policies that block unsafe deployments. Finally, instrument everything: audit logs, runtime detection, and tested backups.

Security isn’t a destination; it’s a continuous feedback loop. With the guardrails in this guide—plus a bias for least privilege and verification—you can run Kubernetes with confidence and reduce the blast radius when incidents occur. Take the 90‑day plan, adapt it to your environment, and make security part of your platform’s DNA.

Explore with AI

Get AI insights on this article

📤 Share this article

Tip: AI will help you summarize key points and analyze technical details.

Ready to experience Sealos?

Develop, deploy, and scale in one seamless cloud platform

Get Started
Develop, deploy, and scale in one seamless cloud platform
Copyright © 2025 Sealos. All rights reserved.
YouTube