Do you love creating reliable, scalable infrastructure and making sure systems run seamlessly every day?
Description
Operations Engineer (High Availability & Incident Management)
Boost.ai
Frist18.02.2026
AnsettelsesformFast
Why You’ll Love Working Here
You’ll be part of a team that blends software engineering, systems operations, and reliability engineering at the cutting edge of cloud and AI technology. Your work will have a direct impact on how millions of users experience mission-critical conversational services.
We’re building something ambitious, and we’re looking for someone who thrives in an environment where uptime, resilience, and operational excellence truly matter.
What you will be doing
Design, build, and improve infrastructure and systems using DevOps best practices and Infrastructure as Code.
Automate and streamline processes to improve deployment speed, reliability, and scalability.
Collaborate from design to deployment to improve the full lifecycle of services.
Own and continuously improve service availability, with a clear goal of 99.99% uptime across our conversational AI platform.
Design and operate systems with fault tolerance, redundancy, graceful degradation, and fast recovery in mind.
Proactively identify and eliminate single points of failure across infrastructure, application, and operational processes.
Drive improvements in monitoring, alerting, and observability to detect issues before users are impacted.
Lead and evolve our incident management process, including:Clear on-call structures and escalation paths.Well-defined incident severity levels.Fast triage, mitigation, and communication during incidents.
Clear on-call structures and escalation paths.
Well-defined incident severity levels.
Fast triage, mitigation, and communication during incidents.
Act as a technical lead during major incidents, coordinating response and ensuring rapid restoration of service.
Run post-incident reviews and blameless retrospectives, turning incidents into concrete reliability improvements.
Define and track operational metrics such as SLA, SLOs, error budgets, MTTR, and incident frequency.
Support and troubleshoot customer environments, ensuring stability during upgrades and integrations.
Work closely with product and engineering teams to ensure operational readiness for new features and releases.
What we are looking for
You’ll thrive in this role if you have:
BS/MSc in Computer Science or equivalent hands-on experience.
Strong Linux system administration and optimization skills.
Experience with programming/scripting languages (e.g. Python, Go, Bash).
Cloud provider experience (preferably AWS).
Familiarity with container technologies such as Docker, Kubernetes & Helm.
Configuration management expertise (Terraform, Ansible, etc.).
Experience with zero-downtime deployment of web applications.
Knowledge of CI/CD principles and tools.
Experience with relational databases (PostgreSQL, MySQL).
Solid understanding of monitoring, alerting, and observability tools.
A strong interest in DevOps, reliability engineering, and operational best practices.
Additionally, we believe you will succeed if you
Are proactive, solution-oriented, and calm under pressure.
Have a strong sense of ownership — you care deeply about uptime and user impact.
Are comfortable making decisions during incidents and communicating clearly with stakeholders.
Are equally comfortable working independently and as part of a collaborative team.
Able to prioritize effectively in environments where not all problems are equal.
What’s in it for you?
Impact: Operate AI infrastructure that must be availableall the time— and make it better every day.
Growth: A steep career trajectory with opportunities to shape our reliability and incident management strategy.
Innovation: Freedom to improve processes, tooling, and architecture in pursuit of world-class availability.
People: A highly motivated team with a shared goal of operational excellence.
Environment: A supportive and dynamic workplace culture, both professionally and socially
Rewards: Competitive salary and exciting benefits.
Sounds good?
Please submit your application using the appropriate form - we’re looking forward to hearing from you and what you can bring to our company!
During the recruitment process, we interview the appropriate candidates quickly and continuously - until we find the right candidate. We recommend that you submit your application as soon as possible.
The position requires being able to work on-premise inStavanger, Oslo, Norway or Copenhagen, Denmark
About boost.ai
Boost.ai is the trusted leader in AI-powered customer experience solutions for regulated industries. Built for security, speed, and scale, the platform enables fast deployment, high-resolution rates, and full hybrid control through seamless orchestration of traditional NLU and LLMs. With over 600 live virtual agents, and more than 150 million automated conversations, boost.ai helps enterprises around the world resolve with confidence, automate at scale, and trust every conversation.
Proven performance and enterprise-grade reliability make boost.ai the partner of choice for leading brands across the world, including Nordea, Credit Union of Colorado, Sage, DNB, Trading 212, and more. Boost.ai is recognized as a Leader in Gartner’s 2025 Magic QuadrantTM for Conversational AI Platforms. Learn more at boost.ai.
FerdigheterAI-generert
Aktiv oppfølging
Automatisering
CI/CD (Continuous Integration and Continuous Delivery)
Erfaring med drift av Linux-servere og infrastruktur
Infrastructure as code (IaC)
K8s
Konfigurasjonsstyring
Programmeringsspråk
Nøkkelord
AWS, PostgreSQL
Spørsmål om stillingen
Kontaktperson:Hadle Ropeid Selsås
Stillingstittel:None
Send melding
Firmaets beliggenhet
4313Sandnes
Annonseinformasjon
FINN-kode447084866
Sist endret20.1.2026, 19:58
Rapporter annonse