A-LISTWAREEurope

ML-focused Backend Engineer (Realtime/Inference/Highload)

Description

Our Customer: Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability. Your tasks: Lead the design and development of low-latency ML inference services handling massive request volumes. Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs. Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production. Design systems for model versioning, shadowing, and A/B testing at runtime. Ensure high availability, scalability, and observability of production systems. Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques. Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders. Required Experience and Skills: B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field. 5+ years of experience building high-performance backend or ML inference systems. Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML). Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing. Strong understanding of model deployment, online/offline feature parity, and real-time monitoring. Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes). Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access. Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices. Strong ownership mindset and ability to deliver solutions end-to-end. Passion for performance, clean architecture, and impactful systems. Would be a plus: Prior experience leading high-throughput, low-latency ML systems in production. Knowledge of real-time feature pipelines and streaming data platforms. Familiarity with advanced monitoring and profiling techniques for ML services.

Skills

MLKubernetesKafkaPrometheusGCPFastAPIPythonGrafanaOpenTelemetryRedisAWSMachine Learning

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching