CrunchCodeRemote

Middle+/Senior Data Engineer

Description

Responsibilities Building both batch and streaming pipelines in production environments

Requirements (Must-have) 4-5 years of experience in Data Engineering Strong experience with Apache Spark (including Structured Streaming) Experience building both batch and streaming pipelines in production environments Proven experience deg AWS-based data lake architectures (S3, EMR, Glue, Athena) Experience with event streaming platforms such as Apache Kafka or Amazon Kinesis Experience implementing lakehouse formats such as Delta Lake Strong understanding of partitioning strategies and schema evolution Experience using SparkUI and AWS CloudWatch for profiling and optimization Strong understanding of Spark performance tuning (shuffle, skew, memory, partitioning) Proven track record of cost optimization in AWS environments Experience with Docker and CI/CD pipelines Experience with Infrastructure as Code (Terraform, AWS CDK, or similar) Familiarity with monitoring and observability practices

Nice to Have Experience in the financial domain Experience running Spark workloads on Kubernetes Exposure to or interest in Large Language Models (LLMs) and AI integration Experience implementing data quality frameworks or metadata/lineage systems

Hiring Process Intro call Technical discussion (focused on real experience) Offer Start: ASAP

Skills

SparkApache SparkTerraformCI/CDApacheKubernetesKafkaAIData EngineeringDockerAWS

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching