ArcRemote

Data Engineer

Posted Yesterday

Description

Overview

We are looking for a highly skilled Data Engineer with strong experience building AI-ready data pipelines, deg scalable cloud architectures, and implementing CI/CD over data and infrastructure workflows. This role sits at the intersection of data engineering, machine learning operations (MLOps), and cloud infrastructure, and will play a key role in turning raw data into production-grade AI systems.

Responsibilities

Data & AI Pipelines

Design, build, and maintain end-to-end data pipelines supporting analytics and AI/ML use cases
Develop feature pipelines and data transformations for model training, inference, and monitoring
Implement batch and streaming pipelines using modern data stacks
Ensure data quality, lineage, versioning, and observability across pipelines

AI / MLOps Enablement

Build and operationalize model training and inference pipelines
Integrate data pipelines with ML frameworks and orchestration tools
Support model versioning, experiment tracking, and reproducible training
Collaborate with data scientists and ML engineers to productionize models

Cloud & Architecture

Design scalable, fault-tolerant cloud architectures for data and AI workloads
Work with cloud services (AWS, GCP, or Azure) for storage, compute, orchestration, and networking
Optimize cost, performance, and reliability of data infrastructure
Make architectural decisions around data lakes, warehouses, feature stores, and serving layers

Infrastructure as Code & CI/CD

Implement Infrastructure as Code (IaC) using tools like Terraform, Pulumi, or CloudFormation
Build CI/CD pipelines for data workflows, ML pipelines, and infrastructure changes
Automate testing, validation, and deployment of data and AI systems
Enforce best practices around security, secrets management, and access control

Collaboration & Ownership

Work cross-functionally with product, ML, and engineering teams
Own systems end-to-end, from design through production and monitoring
Document architectures, pipelines, and operational processes

Required Qualifications

4+ years of experience in data engineering or platform engineering
Strong proficiency in Python (and/or Scala) for data and pipeline development
Experience building AI/ML pipelines (training, inference, or feature engineering)
Solid experience with cloud platforms (AWS, GCP, or Azure)
Hands-on experience with CI/CD pipelines and Infrastructure as Code
Strong understanding of data modeling, distributed systems, and pipeline orchestration

Preferred / Nice to Have

Experience with orchestration tools (Airflow, Dagster, Prefect, Argo)
Experience with data warehouses and lakes (Snowflake, BigQuery, Redshift, Delta/Iceberg)
Familiarity with MLOps tools (MLflow, SageMaker, Vertex AI, Kubeflow)
Experience with streaming systems (Kafka, Pub/Sub, Kinesis)
Background in building production AI systems, not just experimentation

What Success Looks Like

Reliable, scalable data pipelines powering AI and analytics use cases
Clean, automated deployments of data and ML infrastructure
Reduced time from data ingestion to model production
Clear, well-documented architectures that scale with the business

Skills

Platform EngineeringNode.jsPythonAirflowCI/CDAICloudFormationBigQuerySecurityMLflowRedshiftAzureTerraformSQLPulumiAWSPrefectScalaKubeflowSecrets ManagementGCPSnowflakeData EngineeringMLKafkaJavaScriptMachine Learning