CompanyRemote

Java Spark Data Pipeline Engineer

Deadline: 2026-04-01

Description

Budget: ₹12500 - ₹37500

I need an experienced data engineer who can take full ownership of end-to-end pipeline work on GCP. The core stack is Java, Apache Spark, Google Cloud Storage and Airflow.

Your primary focus will be to design and build new pipelines that ingest data from both our transactional databases and files already landing in Cloud Storage, then transform and load it into our analytical layer. Along the way, I expect you to monitor, tune and refactor the existing Spark jobs so they keep performing efficiently as volumes grow.

Because all orchestration happens in Airflow, you should be comfortable creating clear, idempotent DAGs, setting up alerts and handling retries gracefully. Strong knowledge of GCP services such as Cloud Composer, BigQuery and IAM roles will make your life—and mine—easier.

I am based in Bangalore; if you are too, that’s a plus for occasional white-board sessions. Telugu fluency is another nice extra, though not mandatory.

Deliverables • Production-ready Spark jobs written in Java and deployed on GCP • Airflow DAGs with parameterised configs, logging and alerting enabled • Documentation covering pipeline design, run-book for maintenance, and optimization decisions • Handover session to walk through code and deployment steps

I’m ready to move quickly once I find the right fit, so please highlight similar projects you’ve delivered with this tool chain and any performance gains you achieved.

Skills

BigQueryIAMJ2EEAirflowApacheJavaGCPSparkGoogle Cloud PlatformData PipelineETLLinuxHadoopApache Spark

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching