ArcMultiple locations

Backend Engineer - Perm - US/ UK/ Western Europe

Description

Backend Engineer (APIs, Load Balancer, Scheduler) Full-Time · Remote or Hybrid · High-Impact Role

About Odyn

Odyn is building transformative AI solutions powered by cutting-edge, high-performance infrastructure. We're seeking a Backend Engineer to architect high-performance APIs, load balancers, and workload schedulers for our GPU infrastructure platform.

What You'll Do

Build Core Backend Systems & Load Balancing ● Architect RESTful and GraphQL APIs for model inference, fine-tuning, and resource management. ● Build API gateways handling authentication, rate limiting, routing, and multi-tenant isolation. ● Develop intelligent load balancers distributing inference requests across GPU clusters based on latency, cost, and availability with health checking and circuit breakers. ● Implement token streaming, batching, and request queuing for LLM inference workloads. Design Workload Schedulers & Resource Orchestration ● Design GPU resource allocation schedulers with bin-packing algorithms optimizing for topology (NVLink, PCIe, NUMA), workload type, and cost. ● Implement preemption, checkpointing, and graceful eviction for multi-tenant environments. ● Integrate with Kubernetes schedulers, custom operators, or standalone orchestration systems. Ensure Reliability, Performance & Collaboration ● Build observability systems (Prometheus, Grafana, OpenTelemetry) tracking API performance, load balancer health, and scheduler efficiency. ● Define and monitor SLOs for latency, throughput, and availability. ● Partner with infrastructure, ML, and product teams to optimize systems and shape developer- facing features. ● Participate in on-call rotation supporting production systems.

What We're Looking For

Must-Have

● 4–7+ years in backend/distributed systems or infrastructure engineering. ● Strong programming in Python, Go, or Rust; deep API development experience (REST/GraphQL). ● Proven distributed systems expertise: load balancing, service discovery, failover, microservices, message queues. ● Production cloud experience (AWS/GCP/Azure); Kubernetes and Docker proficiency. ● Strong networking fundamentals (TCP/IP, DNS, TLS, HTTP); SQL/NoSQL database skills.

Nice-to-Have

● GPU-accelerated workloads or HPC scheduling experience. ● LLM inference frameworks (vLLM, TensorRT-LLM) or Kubernetes scheduling internals. ● API gateways (Kong, NGINX) or service meshes (Istio); infrastructure-as-code (Terraform, Helm). ● Observability tools (Jaeger, Datadog); high-throughput systems (100k+ req/sec); AI infrastructure startup experience.

Why Join Us

● Work at the frontier of AI infrastructure building systems powering next-generation AI applications. ● Own critical components from the ground up. ● Collaborate with world-class teams. ● Competitive compensation + remote flexibility.

If you have experience building high-performance APIs, distributed schedulers, or load balancers for cloud infrastructure, we strongly encourage you to apply.

Skills

SQLRESTfulPrometheusLoad BalancerMLTLSJaegerNginxKubernetesGoGrafanaTerraformDatadogAIOpenTelemetryHelmAPITlsLLMIstioPythonGCPDockerAzureMicroservicesRESTDNSRustGraphQLAWSMachine Learning