Back to listings
ArcRemote

Data Engineer

Description

Overview

We are looking for a highly skilled Data Engineer with strong experience building AI-ready data pipelines, deg scalable cloud architectures, and implementing CI/CD over data and infrastructure workflows. This role sits at the intersection of data engineering, machine learning operations (MLOps), and cloud infrastructure, and will play a key role in turning raw data into production-grade AI systems.

Responsibilities

Data & AI Pipelines

  • Design, build, and maintain end-to-end data pipelines supporting analytics and AI/ML use cases
  • Develop feature pipelines and data transformations for model training, inference, and monitoring
  • Implement batch and streaming pipelines using modern data stacks
  • Ensure data quality, lineage, versioning, and observability across pipelines

AI / MLOps Enablement

  • Build and operationalize model training and inference pipelines
  • Integrate data pipelines with ML frameworks and orchestration tools
  • Support model versioning, experiment tracking, and reproducible training
  • Collaborate with data scientists and ML engineers to productionize models

Cloud & Architecture

  • Design scalable, fault-tolerant cloud architectures for data and AI workloads
  • Work with cloud services (AWS, GCP, or Azure) for storage, compute, orchestration, and networking
  • Optimize cost, performance, and reliability of data infrastructure
  • Make architectural decisions around data lakes, warehouses, feature stores, and serving layers

Infrastructure as Code & CI/CD

  • Implement Infrastructure as Code (IaC) using tools like Terraform, Pulumi, or CloudFormation
  • Build CI/CD pipelines for data workflows, ML pipelines, and infrastructure changes
  • Automate testing, validation, and deployment of data and AI systems
  • Enforce best practices around security, secrets management, and access control

Collaboration & Ownership

  • Work cross-functionally with product, ML, and engineering teams
  • Own systems end-to-end, from design through production and monitoring
  • Document architectures, pipelines, and operational processes

Required Qualifications

  • 4+ years of experience in data engineering or platform engineering
  • Strong proficiency in Python (and/or Scala) for data and pipeline development
  • Experience building AI/ML pipelines (training, inference, or feature engineering)
  • Solid experience with cloud platforms (AWS, GCP, or Azure)
  • Hands-on experience with CI/CD pipelines and Infrastructure as Code
  • Strong understanding of data modeling, distributed systems, and pipeline orchestration

Preferred / Nice to Have

  • Experience with orchestration tools (Airflow, Dagster, Prefect, Argo)
  • Experience with data warehouses and lakes (Snowflake, BigQuery, Redshift, Delta/Iceberg)
  • Familiarity with MLOps tools (MLflow, SageMaker, Vertex AI, Kubeflow)
  • Experience with streaming systems (Kafka, Pub/Sub, Kinesis)
  • Background in building production AI systems, not just experimentation

What Success Looks Like

  • Reliable, scalable data pipelines powering AI and analytics use cases
  • Clean, automated deployments of data and ML infrastructure
  • Reduced time from data ingestion to model production
  • Clear, well-documented architectures that scale with the business

Skills

Platform EngineeringNode.jsPythonAirflowCI/CDAICloudFormationBigQuerySecurityMLflowRedshiftAzureTerraformSQLPulumiAWSPrefectScalaKubeflowSecrets ManagementGCPSnowflakeData EngineeringMLKafkaJavaScriptMachine Learning