Question 1

Explain what happens when you type a URL into your browser and press Enter.

Accepted Answer

DNS resolves the hostname to an IP, the OS opens a TCP connection (often via TLS handshake on 443), the browser sends an HTTP request, the server responds with HTML, and the browser parses it — fetching CSS/JS/images in parallel, building the DOM and CSSOM, then rendering. Caches (browser, DNS, CDN) short-circuit many of these steps.

Question 2

What's the difference between a process and a thread on Linux?

Accepted Answer

A process has its own memory space and file descriptors; a thread shares them with sibling threads inside the same process. Linux actually treats them similarly under the hood via clone(2); the difference is which flags are set (CLONE_VM, CLONE_FILES, etc.).

Question 3

How do you find which process is listening on port 8080?

Accepted Answer

`ss -ltnp 'sport = :8080'` or `lsof -iTCP:8080 -sTCP:LISTEN -n -P`. Older systems used netstat -tulpn.

Question 4

What does a load balancer do that DNS round-robin can't?

Accepted Answer

Health checks, weighted routing, sticky sessions, TLS termination, layer-7 routing (paths/headers), and real-time failover. DNS round-robin returns IPs in order with TTL caching — it has no awareness of backend health.

Question 5

What's the difference between a Docker image and a container?

Accepted Answer

An image is an immutable, layered filesystem snapshot plus metadata (CMD, ENV, etc.). A container is a running instance of that image with a thin writable layer on top and its own namespaces/cgroups.

Question 6

Why use a multi-stage Dockerfile?

Accepted Answer

To keep build tools (compilers, dev dependencies) out of the final image. You build in a heavy stage and COPY only the artifacts into a slim runtime stage. Result: smaller, more secure images.

Question 7

What happens to data written inside a container when it stops?

Accepted Answer

It's discarded unless written to a volume or bind mount. The writable layer is destroyed when the container is removed.

Question 8

How does Docker isolate containers from the host?

Accepted Answer

Linux namespaces (PID, NET, MNT, UTS, IPC, USER) isolate what the container can see, and cgroups limit what it can use (CPU, memory, I/O). Seccomp, AppArmor/SELinux, and capabilities further restrict what it can do.

Question 9

Explain the difference between a Deployment, a StatefulSet, and a DaemonSet.

Accepted Answer

Deployment manages stateless replicas with rolling updates. StatefulSet gives each pod a stable identity and persistent storage (databases, queues). DaemonSet runs one pod per node (log shippers, node exporters).

Question 10

What's the difference between a Service of type ClusterIP, NodePort, and LoadBalancer?

Accepted Answer

ClusterIP exposes the service inside the cluster only. NodePort opens a static port on every node. LoadBalancer asks the cloud provider for an external LB pointing at NodePorts. In modern setups, Ingress + ClusterIP is the usual pattern.

Question 11

How do liveness and readiness probes differ?

Accepted Answer

Liveness restarts a pod when it's broken. Readiness removes it from Service endpoints when it can't serve traffic (but doesn't restart it). Use readiness during slow startup or temporary dependency loss.

Question 12

What happens when you run `kubectl apply`?

Accepted Answer

kubectl sends the manifest to the API server, which validates and persists it to etcd. Controllers (e.g. Deployment controller) reconcile actual state toward desired state — creating ReplicaSets and Pods. The scheduler binds pods to nodes; kubelet pulls images and starts containers.

Question 13

What's the difference between continuous delivery and continuous deployment?

Accepted Answer

Continuous delivery means every change that passes CI is releasable (deploy is a click). Continuous deployment removes the click — every passing change goes to production automatically.

Question 14

How would you secure secrets in a CI pipeline?

Accepted Answer

Use the platform's encrypted secret store (GitHub Actions secrets, GitLab CI variables marked masked+protected), inject them only into jobs that need them, never echo them, scope them to environments, and rotate them. For higher trust: OIDC federation to cloud IAM instead of long-lived keys.

Question 15

How do you keep a pipeline fast as the project grows?

Accepted Answer

Cache dependencies, parallelize jobs by test shard or package, use change detection (only run affected projects), pre-warm Docker layer cache, and use larger runners for the bottleneck step.

Question 16

How do you roll back a bad deploy safely?

Accepted Answer

Keep the previous artifact addressable (image tag, commit SHA). For containers, redeploy the previous tag. For DB schema changes, use expand/contract migrations so old code still works against the new schema.

Question 17

What is Terraform state and why does it matter?

Accepted Answer

State maps Terraform resources to real-world IDs. Without it, Terraform can't tell what already exists, so it would try to recreate everything. Store it remotely (S3+DynamoDB, GCS, Terraform Cloud) with locking to prevent concurrent corruption.

Question 18

How do you handle secrets in Terraform?

Accepted Answer

Don't put them in .tf files. Pull from a secrets manager at apply time (Vault data source, AWS SSM/Secrets Manager), or pass via env vars / TF_VAR_*. Treat state as sensitive — it can contain secrets verbatim.

Question 19

What's the difference between `terraform plan` and `terraform apply`?

Accepted Answer

Plan shows what will change without changing anything. Apply executes the plan. In CI, run plan on PRs for review, then apply on merge to main.

Question 20

When would you choose Pulumi or CDK over Terraform?

Accepted Answer

When the team strongly prefers a general-purpose language (TypeScript/Python/Go) over HCL, when you need rich abstractions/loops, or when you're already deep in a single cloud (CDK is AWS-native). Terraform wins for multi-cloud, mature provider ecosystem, and operational simplicity.

Question 21

What are SLIs, SLOs, and error budgets?

Accepted Answer

SLI is a measured signal (e.g. % of requests under 300ms). SLO is the target (e.g. 99.5% per 30 days). Error budget is what's left of the allowed unreliability. When the budget is spent, you freeze risky changes.

Question 22

What's the difference between metrics, logs, and traces?

Accepted Answer

Metrics are cheap numeric time-series, great for alerting. Logs are discrete events with context, great for debugging. Traces follow a request across services, great for latency analysis. Modern observability uses all three together.

Question 23

How would you debug a sudden spike in p99 latency?

Accepted Answer

Start with a dashboard: is it one service or all? Check correlated deploys, infra events, and saturation (CPU, GC, DB connections). Use traces to find the slow span; logs around that timestamp on that instance. Roll back if the spike correlates with a recent change.

Question 24

Tell me about a time you caused a production incident.

Accepted Answer

Use STAR: Situation, Task, Action, Result. Be honest about the mistake, show what you did to mitigate, and emphasize the postmortem learnings and the guardrail you added afterwards (test, alert, runbook, automation).

Question 25

How do you balance shipping fast with reliability?

Accepted Answer

Frame it as alignment, not tension: small batches, feature flags, progressive rollouts, and error budgets give the team a shared language. When the budget is healthy, ship fast; when it's burning, slow down and pay back risk.

Question 26

Why DevOps and not pure backend / SRE?

Accepted Answer

Show a genuine reason — you love the systems thinking, the leverage of automation, the cross-team collaboration — backed by something concrete you've built (a pipeline, an IaC repo, a monitoring stack).

DevOps interview questions & answers

Linux & Networking