Site Reliability Engineer
Very High DemandAverage market rate950-1200 SEK/h
Site Reliability Engineers ensure the reliability, availability, and performance of large-scale distributed systems. They apply software engineering principles to infrastructure and operations problems, automating processes and improving system resilience.
Skills
Linux/Unix systems, cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), infrastructure as code (Terraform, Ansible), monitoring (Prometheus, Grafana), programming (Python, Go), incident management, and SLO/SLI definition.
Responsibilities
- Design and implement highly available and scalable systems
- Develop automation tools and infrastructure as code
- Monitor system performance and reliability metrics
- Respond to incidents and perform root cause analysis
- Define and track SLOs, SLIs, and error budgets
- Collaborate with development teams on reliability improvements