5 Mistakes to Avoid When Deploying Kubernetes in Production

At Devs.lv, we've managed 50+ Kubernetes clusters across industries — from fintech to e-commerce. These are the 5 most common mistakes we see again and again, with specific advice on how to fix them.

1. No Resource Limits — The "Noisy Neighbor" Problem

Without CPU/memory limits, one pod can consume the entire node's resources. The result — other pods start getting OOMKilled or CPU throttled, and your production service slows down with no obvious cause.

Solution:

Always define resources.requests and resources.limits for every container
Start with requests: cpu: 100m, memory: 128Mi and adjust based on actual usage data
Use LimitRange and ResourceQuota at the namespace level as a safety net
Set up monitoring with Prometheus + Grafana to see actual resource consumption

2. Single Replica — The "Hope It Doesn't Crash" Approach

Deployment with replicas: 1 is not production-ready. If that single pod crashes or the node restarts, your service is offline. Even for a few seconds, that can mean lost orders or frustrated customers.

Solution:

Minimum 3 replicas for every production deployment
Configure podAntiAffinity to spread replicas across different nodes
Use PodDisruptionBudget (PDB) to guarantee minimum replica count during node maintenance
Consider topologySpreadConstraints for distribution across availability zones

3. No Health Checks — The "Schrödinger Pod"

Without liveness and readiness probes, Kubernetes can't tell if your application is actually working. A pod can be in "Running" status while the service inside is frozen or not handling requests.

Solution:

Readiness probe — determines when a pod is ready to receive traffic. Without it, Kubernetes sends traffic to pods that aren't ready yet.
Liveness probe — determines if a pod is "alive." If not, Kubernetes automatically restarts it.
Startup probe — use for applications with long startup times (Java, .NET) so the liveness probe doesn't kill slow-starting pods.
HTTP endpoints are better than TCP checks — /healthz or /ready can verify database connections and other dependencies.

4. Secrets in YAML Files — "They Really Did Commit Credentials"

Kubernetes Secrets are base64 encoded, not encrypted. If you commit them to a Git repository, anyone with repo access can read them. We've seen database passwords, API keys, and even private certificates in public Git repos.

Solution:

External Secrets Operator — syncs secrets from AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault directly into your Kubernetes cluster
Sealed Secrets — encrypts secrets with a public key so they can be safely stored in Git
SOPS — Mozilla's tool for encrypting YAML/JSON files with KMS or PGP
Add .gitignore entries and pre-commit hooks that block accidental secret commits

5. No Network Policies — "Everyone Sees Everything"

By default, all pods can communicate with all other pods in the cluster. This means a compromised pod in your staging namespace can reach the production database. One vulnerability can become a full cluster breach.

Solution:

Start with a "deny all" default policy for each namespace
Open only necessary connections — e.g., frontend → backend → database
Use namespace isolation — production, staging, and development should never communicate with each other
Consider a service mesh (Istio, Linkerd) if you need mTLS and granular traffic control

Bonus: Lack of Monitoring

Many set up Kubernetes and forget about monitoring. Without metrics and alerts, you'll learn about problems only when a customer reports an error.

Minimum monitoring stack: Prometheus + Grafana + Alertmanager. Add Loki for log analysis and Jaeger or Tempo for distributed tracing.

Conclusion

These mistakes are simple to prevent, but they cause 80% of production incidents we see. Most are preventive measures that take a few hours but save you from nights spent on incident resolution.

Want a Kubernetes security audit? We can review your cluster in 2-3 days and provide a concrete action plan.

5 Mistakes to Avoid When Deploying Kubernetes in Production

1. No Resource Limits — The "Noisy Neighbor" Problem

2. Single Replica — The "Hope It Doesn't Crash" Approach

3. No Health Checks — The "Schrödinger Pod"

4. Secrets in YAML Files — "They Really Did Commit Credentials"

5. No Network Policies — "Everyone Sees Everything"

Bonus: Lack of Monitoring

Conclusion

Related Articles

On-Premise AI vs Cloud Solutions: Which to Choose?

Magento Migration to Headless Architecture: Step by Step

Need help with your project?