Back to blog
DevOps6 min read

5 Mistakes to Avoid When Deploying Kubernetes in Production

From our experience with 50+ Kubernetes clusters — the most common problems and how to prevent them before they affect your users.

5 Mistakes to Avoid When Deploying Kubernetes in Production

At Devs.lv, we've managed 50+ Kubernetes clusters across industries — from fintech to e-commerce. These are the 5 most common mistakes we see again and again, with specific advice on how to fix them.

1. No Resource Limits — The "Noisy Neighbor" Problem

Without CPU/memory limits, one pod can consume the entire node's resources. The result — other pods start getting OOMKilled or CPU throttled, and your production service slows down with no obvious cause.

Solution:

  • Always define resources.requests and resources.limits for every container
  • Start with requests: cpu: 100m, memory: 128Mi and adjust based on actual usage data
  • Use LimitRange and ResourceQuota at the namespace level as a safety net
  • Set up monitoring with Prometheus + Grafana to see actual resource consumption

2. Single Replica — The "Hope It Doesn't Crash" Approach

Deployment with replicas: 1 is not production-ready. If that single pod crashes or the node restarts, your service is offline. Even for a few seconds, that can mean lost orders or frustrated customers.

Solution:

  • Minimum 3 replicas for every production deployment
  • Configure podAntiAffinity to spread replicas across different nodes
  • Use PodDisruptionBudget (PDB) to guarantee minimum replica count during node maintenance
  • Consider topologySpreadConstraints for distribution across availability zones

3. No Health Checks — The "Schrödinger Pod"

Without liveness and readiness probes, Kubernetes can't tell if your application is actually working. A pod can be in "Running" status while the service inside is frozen or not handling requests.

Solution:

  • Readiness probe — determines when a pod is ready to receive traffic. Without it, Kubernetes sends traffic to pods that aren't ready yet.
  • Liveness probe — determines if a pod is "alive." If not, Kubernetes automatically restarts it.
  • Startup probe — use for applications with long startup times (Java, .NET) so the liveness probe doesn't kill slow-starting pods.
  • HTTP endpoints are better than TCP checks — /healthz or /ready can verify database connections and other dependencies.

4. Secrets in YAML Files — "They Really Did Commit Credentials"

Kubernetes Secrets are base64 encoded, not encrypted. If you commit them to a Git repository, anyone with repo access can read them. We've seen database passwords, API keys, and even private certificates in public Git repos.

Solution:

  • External Secrets Operator — syncs secrets from AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault directly into your Kubernetes cluster
  • Sealed Secrets — encrypts secrets with a public key so they can be safely stored in Git
  • SOPS — Mozilla's tool for encrypting YAML/JSON files with KMS or PGP
  • Add .gitignore entries and pre-commit hooks that block accidental secret commits

5. No Network Policies — "Everyone Sees Everything"

By default, all pods can communicate with all other pods in the cluster. This means a compromised pod in your staging namespace can reach the production database. One vulnerability can become a full cluster breach.

Solution:

  • Start with a "deny all" default policy for each namespace
  • Open only necessary connections — e.g., frontend → backend → database
  • Use namespace isolation — production, staging, and development should never communicate with each other
  • Consider a service mesh (Istio, Linkerd) if you need mTLS and granular traffic control

Bonus: Lack of Monitoring

Many set up Kubernetes and forget about monitoring. Without metrics and alerts, you'll learn about problems only when a customer reports an error.

Minimum monitoring stack: Prometheus + Grafana + Alertmanager. Add Loki for log analysis and Jaeger or Tempo for distributed tracing.

Conclusion

These mistakes are simple to prevent, but they cause 80% of production incidents we see. Most are preventive measures that take a few hours but save you from nights spent on incident resolution.

Want a Kubernetes security audit? We can review your cluster in 2-3 days and provide a concrete action plan.

Need help with your project?

Get in touch — we help bring your ideas to life.

Contact Us