Boost.aiSandnes

Do you love creating reliable, scalable infrastructure and making sure systems run seamlessly every day?

Posted Today

Description

Operations Engineer (High Availability & Incident Management)

Boost.ai

Frist18.02.2026
AnsettelsesformFast

Why You’ll Love Working Here

You’ll be part of a team that blends software engineering, systems operations, and reliability engineering at the cutting edge of cloud and AI technology. Your work will have a direct impact on how millions of users experience mission-critical conversational services.

We’re building something ambitious, and we’re looking for someone who thrives in an environment where uptime, resilience, and operational excellence truly matter.

What you will be doing

Design, build, and improve infrastructure and systems using DevOps best practices and Infrastructure as Code.
Automate and streamline processes to improve deployment speed, reliability, and scalability.
Collaborate from design to deployment to improve the full lifecycle of services.
Own and continuously improve service availability, with a clear goal of 99.99% uptime across our conversational AI platform.
Design and operate systems with fault tolerance, redundancy, graceful degradation, and fast recovery in mind.
Proactively identify and eliminate single points of failure across infrastructure, application, and operational processes.
Drive improvements in monitoring, alerting, and observability to detect issues before users are impacted.
Lead and evolve our incident management process, including:Clear on-call structures and escalation paths.Well-defined incident severity levels.Fast triage, mitigation, and communication during incidents.
Clear on-call structures and escalation paths.
Well-defined incident severity levels.
Fast triage, mitigation, and communication during incidents.
Act as a technical lead during major incidents, coordinating response and ensuring rapid restoration of service.
Run post-incident reviews and blameless retrospectives, turning incidents into concrete reliability improvements.
Define and track operational metrics such as SLA, SLOs, error budgets, MTTR, and incident frequency.
Support and troubleshoot customer environments, ensuring stability during upgrades and integrations.
Work closely with product and engineering teams to ensure operational readiness for new features and releases.

What we are looking for

You’ll thrive in this role if you have:

BS/MSc in Computer Science or equivalent hands-on experience.
Strong Linux system administration and optimization skills.
Experience with programming/scripting languages (e.g. Python, Go, Bash).
Cloud provider experience (preferably AWS).
Familiarity with container technologies such as Docker, Kubernetes & Helm.
Configuration management expertise (Terraform, Ansible, etc.).
Experience with zero-downtime deployment of web applications.
Knowledge of CI/CD principles and tools.
Experience with relational databases (PostgreSQL, MySQL).
Solid understanding of monitoring, alerting, and observability tools.
A strong interest in DevOps, reliability engineering, and operational best practices.

Additionally, we believe you will succeed if you

Are proactive, solution-oriented, and calm under pressure.
Have a strong sense of ownership — you care deeply about uptime and user impact.
Are comfortable making decisions during incidents and communicating clearly with stakeholders.
Are equally comfortable working independently and as part of a collaborative team.
Able to prioritize effectively in environments where not all problems are equal.

What’s in it for you?

Impact: Operate AI infrastructure that must be availableall the time— and make it better every day.
Growth: A steep career trajectory with opportunities to shape our reliability and incident management strategy.
Innovation: Freedom to improve processes, tooling, and architecture in pursuit of world-class availability.
People: A highly motivated team with a shared goal of operational excellence.
Environment: A supportive and dynamic workplace culture, both professionally and socially
Rewards: Competitive salary and exciting benefits.

Sounds good?

Please submit your application using the appropriate form - we’re looking forward to hearing from you and what you can bring to our company!

During the recruitment process, we interview the appropriate candidates quickly and continuously - until we find the right candidate. We recommend that you submit your application as soon as possible.
The position requires being able to work on-premise inStavanger, Oslo, Norway or Copenhagen, Denmark

About boost.ai

Boost.ai is the trusted leader in AI-powered customer experience solutions for regulated industries. Built for security, speed, and scale, the platform enables fast deployment, high-resolution rates, and full hybrid control through seamless orchestration of traditional NLU and LLMs. With over 600 live virtual agents, and more than 150 million automated conversations, boost.ai helps enterprises around the world resolve with confidence, automate at scale, and trust every conversation.

Proven performance and enterprise-grade reliability make boost.ai the partner of choice for leading brands across the world, including Nordea, Credit Union of Colorado, Sage, DNB, Trading 212, and more. Boost.ai is recognized as a Leader in Gartner’s 2025 Magic QuadrantTM for Conversational AI Platforms. Learn more at boost.ai.

FerdigheterAI-generert

Aktiv oppfølging
Automatisering
CI/CD (Continuous Integration and Continuous Delivery)
Erfaring med drift av Linux-servere og infrastruktur
Infrastructure as code (IaC)
K8s
Konfigurasjonsstyring
Programmeringsspråk

Nøkkelord

AWS, PostgreSQL

Spørsmål om stillingen

Kontaktperson:Hadle Ropeid Selsås
Stillingstittel:None

Send melding

Firmaets beliggenhet

4313Sandnes

Annonseinformasjon

FINN-kode447084866
Sist endret20.1.2026, 19:58

Rapporter annonse

Skills

DockerMySQLCI/CDDevOpsAnsibleGoKubernetesPostgreSQLLinuxPythonTerraformAWSBashSecurityHelm