Career Guidance
How to Prepare for a DevOps Interview
DevOps interviews test whether you can build, deploy, and maintain infrastructure in production. They are less about memorising definitions and more about demonstrating that you have actually worked with the tools and solved real problems. The candidates who succeed are the ones who can explain not just what a tool does, but why they chose it and how they debugged it when things went wrong.
This guide covers the full preparation strategy: what technical topics to study, common questions with strong sample answers, how to handle scenario-based challenges, and the soft skills that interviewers evaluate even when they do not explicitly ask about them.
What to expect in a DevOps interview
Most DevOps interview processes follow a similar structure:
- Recruiter screen (30 minutes) basic fit check, salary expectations, experience overview
- Technical phone screen (45-60 minutes) questions on core DevOps concepts, tools, and past experience
- Technical deep dive (60-90 minutes) hands-on problem solving, architecture design, or take-home assignment
- Scenario / behavioural round (45-60 minutes) incident response scenarios, collaboration examples, past project walkthroughs
- Team fit / culture round (30-45 minutes) working style, values alignment, questions for the team
Not every company uses all five stages. Startups might compress it into two rounds. Large enterprises might add additional stages. But the technical topics covered are remarkably consistent.
The technical topics you must know
Linux fundamentals
Linux is the foundation. Nearly every DevOps tool runs on Linux, and interviewers expect you to be comfortable on the command line.
What to know:
- File system navigation, permissions (
chmod,chown), and ownership - Process management (
ps,top,kill,systemctl) - Package management (
apt,yum,dnf) - Log analysis (
journalctl,/var/log/,grep,awk) - Disk and memory usage (
df,du,free,vmstat) - SSH configuration and key management
- Cron jobs and systemd services
- Basic shell scripting (loops, conditionals, functions, pipes)
Sample question: "A server is running slowly. Walk me through how you would diagnose the issue."
Strong answer: "I would start with top or htop to check CPU and memory usage and identify any processes consuming excessive resources. Then df -h to check disk space and iostat to check disk I/O. I would look at free -m for memory pressure and check if swap is being used heavily. I would examine recent logs with journalctl -xe and application-specific logs. If it is a network issue, I would use ss -tlnp to check listening ports and ping or traceroute to test connectivity. The goal is to narrow down whether the bottleneck is CPU, memory, disk, or network."
For a comprehensive reference on essential commands, see our Linux commands for DevOps guide.
Networking
Networking knowledge separates strong DevOps candidates from average ones. You need to understand how traffic flows from a user's browser to your application.
What to know:
- TCP/IP model, DNS resolution, HTTP/HTTPS
- Load balancers (Layer 4 vs Layer 7, ALB vs NLB)
- VPCs, subnets, security groups, NACLs
- NAT gateways and internet gateways
- DNS records (A, CNAME, MX, TXT)
- TLS certificates and termination
- Common ports (22, 80, 443, 3306, 5432, 6379)
- Basic troubleshooting (
curl,dig,nslookup,traceroute,netstat/ss)
Sample question: "A user reports they cannot reach your web application. How do you troubleshoot?"
Strong answer: "I would work from the outside in. First, check if DNS resolves correctly with dig or nslookup. Then check if the server responds with curl -v to see the HTTP response and any TLS errors. If DNS and the server are fine, I would check the load balancer health checks are backend instances healthy? Then security groups and NACLs is port 443 open? Then the application itself is the web server process running? Are there error logs? The systematic approach is DNS, network path, load balancer, firewall rules, application."
Docker and containers
Containerisation is expected knowledge for any DevOps role. You should be able to write Dockerfiles, troubleshoot container issues, and explain container concepts clearly.
What to know:
- Dockerfiles (multi-stage builds, layer caching, security best practices)
- Docker CLI (build, run, exec, logs, inspect, network, volume)
- Docker Compose for multi-service applications
- Container networking (bridge, host, overlay)
- Image registries (Docker Hub, ECR, GCR)
- Container security (non-root users, minimal base images, vulnerability scanning)
- Difference between containers and virtual machines
Sample question: "How would you reduce the size of a Docker image?"
Strong answer: "I would use a multi-stage build one stage for building the application, a minimal stage (like alpine or distroless) for the final image. I would combine RUN commands to reduce layers, use .dockerignore to exclude unnecessary files, and order Dockerfile instructions to maximise layer caching. I would also audit dependencies to remove anything not needed at runtime."
Kubernetes
For mid-level and senior roles, Kubernetes knowledge is typically required. Even for junior roles, basic familiarity is expected.
What to know:
- Core objects: Pods, Deployments, Services, Ingress, ConfigMaps, Secrets
- Scaling: Horizontal Pod Autoscaler, resource requests and limits
- Troubleshooting:
kubectl get,describe,logs,exec, events - Networking: ClusterIP, NodePort, LoadBalancer, Ingress controllers
- Storage: PersistentVolumes, PersistentVolumeClaims
- RBAC: Roles, ClusterRoles, RoleBindings
- Helm for package management
For a beginner-friendly introduction, see our Kubernetes explained simply guide. For the full learning path, read our Kubernetes guide for beginners.
CI/CD pipelines
This is core DevOps territory. You should be able to design a pipeline from scratch and explain your choices.
What to know:
- Pipeline stages: build, test, scan, deploy
- Tools: GitHub Actions, GitLab CI, Jenkins, ArgoCD
- Deployment strategies: rolling, blue-green, canary
- Artefact management: container registries, versioning
- Testing in pipelines: unit, integration, security scanning
- GitOps principles: declarative configuration in Git, automated reconciliation
Sample question: "Design a CI/CD pipeline for a microservice."
Strong answer: "On push to a feature branch: run linting and unit tests. On PR merge to main: build the Docker image, tag it with the Git SHA, push to the container registry, run integration tests against a staging environment. For production deployment, I would use ArgoCD watching the main branch when the image tag updates in the Kubernetes manifests, ArgoCD detects the change and performs a rolling deployment. I would include health checks so that a failed deployment rolls back automatically."
For a hands-on walkthrough, see our CI/CD pipeline tutorial for beginners.
Infrastructure as Code
Terraform is the most commonly tested IaC tool. Interviewers expect you to understand state management, modules, and real-world patterns.
What to know:
- Terraform workflow: init, plan, apply, destroy
- State management: remote state, state locking, state file security
- Modules: reusable, versioned infrastructure components
- Variables, outputs, locals, data sources
- Terraform vs other tools (Pulumi, CloudFormation)
- Handling secrets in IaC (never in state or source code)
Sample question: "How do you manage Terraform state in a team?"
Strong answer: "I use remote state stored in S3 with a DynamoDB table for state locking. This prevents two engineers from applying changes simultaneously. State files contain sensitive data, so the S3 bucket has encryption enabled, versioning for rollback, and restricted IAM policies. I also use workspaces or separate state files per environment to isolate dev, staging, and production."
For a deeper dive, see our Terraform for beginners guide and our Ansible vs Terraform comparison.
Cloud platforms
Most roles focus on one cloud provider (usually AWS), but you should understand the core services that any cloud offers.
What to know:
- Compute: EC2, Lambda, ECS, EKS (or equivalents on Azure/GCP)
- Networking: VPC, subnets, security groups, load balancers, Route 53
- Storage: S3, EBS, EFS (or equivalents)
- IAM: roles, policies, least privilege principle
- Managed databases: RDS, DynamoDB (or equivalents)
- Cost management: reserved instances, spot instances, right-sizing
For a comparison of major cloud providers, see our AWS vs Azure vs GCP breakdown.
Monitoring and observability
Understanding how to detect and diagnose problems in production is critical.
What to know:
- Metrics: Prometheus, CloudWatch, Datadog
- Dashboards: Grafana
- Logging: ELK stack, CloudWatch Logs, Loki
- Alerting: PagerDuty, OpsGenie, alerting rules
- The four golden signals: latency, traffic, errors, saturation
- SLIs, SLOs, and error budgets
Scenario-based questions and how to handle them
Scenario questions test how you think under pressure and how you approach problems systematically. There is no single correct answer interviewers want to see your reasoning process.
Incident response scenarios
"Production is down. Walk me through your response."
A strong answer follows a structured framework:
- Assess impact how many users are affected? Is it a full outage or degraded service?
- Communicate notify the team, update the status page, assign an incident commander
- Diagnose check monitoring dashboards, review recent deployments, examine logs
- Mitigate apply the fastest fix (rollback, restart, scale up, failover) even if it is not the root cause
- Resolve once stable, identify and fix the root cause
- Post-mortem document what happened, why, and what changes prevent recurrence
Interviewers are looking for calm, systematic thinking not panicked guessing.
Architecture design scenarios
"Design the infrastructure for a web application that serves 10,000 requests per second."
Structure your answer:
- Clarify requirements what are the latency targets? What is the data model? Read-heavy or write-heavy?
- Draw the architecture load balancer, application tier (Kubernetes or auto-scaling group), database (read replicas, caching), CDN for static assets
- Explain scaling horizontal scaling for stateless services, connection pooling for databases, Redis for caching
- Address reliability multi-AZ deployment, health checks, auto-scaling, automated failover
- Mention monitoring what metrics you would track, what alerts you would set
Troubleshooting scenarios
"A Kubernetes pod keeps restarting. How do you investigate?"
"I would start with kubectl describe pod <name> to check events is it an image pull error, OOM kill, or failed liveness probe? Then kubectl logs <pod> --previous to see logs from the crashed container. I would check resource limits is the pod hitting memory limits? Check the liveness probe configuration is the endpoint correct and responding within the timeout? If the issue is intermittent, I would check node resources with kubectl top nodes to see if the node is under memory pressure."
Portfolio projects that impress
Having projects to reference during interviews is a significant advantage. These are the types that demonstrate real competence:
-
End-to-end CI/CD pipeline a GitHub Actions or GitLab CI pipeline that builds a Docker image, runs tests, pushes to a registry, and deploys to Kubernetes. Include security scanning and automated rollback.
-
Terraform infrastructure project provision a complete environment on AWS: VPC, subnets, security groups, EKS cluster, RDS database, S3 bucket. Use modules, remote state, and output values. Include a README explaining design decisions.
-
Monitoring and alerting stack deploy Prometheus and Grafana on Kubernetes. Create dashboards for application metrics and node metrics. Configure alerting rules that notify via Slack or email.
-
GitOps deployment set up ArgoCD to watch a Git repository and automatically deploy changes to a Kubernetes cluster. Demonstrate a rolling update and a rollback.
Host everything on GitHub with clear documentation. During the interview, you can reference specific projects: "When I built my Terraform modules, I learned that..." This is far more convincing than abstract knowledge.
Soft skills that matter
Technical skills get you through the door. Soft skills determine the hiring decision when multiple candidates have similar technical ability.
Communication
DevOps engineers communicate with developers, product managers, security teams, and leadership. You need to explain technical decisions in language that non-technical stakeholders understand. In the interview, this means: clear, structured answers without unnecessary jargon.
Collaboration
DevOps is inherently cross-functional. Interviewers will ask about times you worked with development teams, resolved disagreements about infrastructure decisions, or helped a team adopt a new tool. Have specific examples ready.
Incident response under pressure
If the company has on-call expectations, they will assess how you handle pressure. Stay calm, describe a methodical approach, and emphasise learning from incidents rather than blaming.
Continuous learning
Technology changes fast. Interviewers want to know how you keep up. Mention specific blogs, communities, certifications you are pursuing, or side projects. Demonstrating curiosity signals long-term growth potential.
The preparation timeline
If you already have DevOps experience, a focused 2-4 week plan works well:
- Week 1: Review core concepts across all topics. Identify weak areas. Start a practice project.
- Week 2: Deep dive into weak areas. Practice scenario-based questions out loud. Finish the practice project.
- Week 3: Mock interviews with a friend or colleague. Refine your project walkthroughs. Polish your GitHub portfolio.
- Week 4: Company-specific preparation research their tech stack, read their engineering blog, prepare questions for the interviewer.
If you are building skills from scratch, a structured programme like a DevOps bootcamp or a self-study plan covering the full DevOps career roadmap will take 3-6 months before you are interview-ready.
The most important advice: build things. Interviewers can tell the difference between someone who read about Terraform and someone who has written and debugged Terraform modules. Hands-on experience is impossible to fake and easy to demonstrate.
Frequently Asked Questions
Ola
Founder, CloudPros
Building the most hands-on DevOps bootcamp for the AI era. 16 weeks of real infrastructure, real projects, real career outcomes.
