GCP Data Engineer
Description
GCP Data Engineer£700 - £750 per day inside IR356-month contractHybrid working in LondonWe're working with a global healthcare and AI research organisation at the forefront of applying data engineering and machine learning to accelerate scientific discovery. Their work supports large-scale, domain-specific datasets that power research into life-changing treatments.They're now looking for a GCP Data Engineer to join a multidisciplinary team responsible for building and operating robust, cloud-native data infrastructure that supports ML workloads, particularly PyTorch-based pipelines.The RoleYou'll focus on deg, building, and maintaining scalable data pipelines and storage systems in Google Cloud, supporting ML teams by enabling efficient data loading, dataset management, and cloud-based training workflows.You'll work closely with ML engineers and researchers, ensuring that large volumes of unstructured and structured data can be reliably accessed, processed, and consumed by PyTorch-based systems.Key ResponsibilitiesDesign and build cloud-native data pipelines using Python on GCPManage large-scale object storage for unstructured data (Google Cloud Storage preferred)Support PyTorch-based workflows, particularly around data loading and dataset management in the cloudBuild and optimise data integrations with BigQuery and SQL databasesEnsure efficient memory usage and performance when handling large datasetsCollaborate with ML engineers to support training and experimentation pipelines (without owning model development)Implement monitoring, testing, and documentation to ensure production-grade reliabilityParticipate in agile ceremonies, code reviews, and technical design discussionsTech Stack & ExperienceMust HaveStrong Python development experienceHands-on experience with cloud object storage for unstructured data(Google Cloud Storage preferred; AWS S3 also acceptable)PyTorch experience, particularly:Dataset managementData loading pipelinesRunning PyTorch workloads in cloud environmentsWe are not looking for years of PyTorch experience - one or two substantial 6-12 month projects is ideal5+ years cloud experience, ideally working with large numbers of files in cloud bucketsNice to HaveExperience with additional GCP services, such as:Cloud RunCloud SQLCloud SchedulerExposure to machine learning workflows (not ML engineering)Some pharma or life sciences experience, or a genuine interest in working with domain-specific scientific dataPlease send your CV £700 - £750 per day inside IR35 6-month contract Hybrid working in London We're working with a global healthcare and AI research organisation at the forefront of applying data engineering and machine learning to accelerate scientific discovery. Their work supports large-scale, domain-specific datasets that power research into life-changing treatments. They're now looking for a GCP Data Engineer to join a multidisciplinary team responsible for building and operating robust, cloud-native data infrastructure that supports ML workloads, particularly PyTorch-based pipelines. You'll focus on deg, building, and maintaining scalable data pipelines and storage systems in Google Cloud, supporting ML teams by enabling efficient data loading, dataset management, and cloud-based training workflows. You'll work closely with ML engineers and researchers, ensuring that large volumes of unstructured and structured data can be reliably accessed, processed, and consumed by PyTorch-based systems. Design and build cloud-native data pipelines using Python on GCP Design and build cloud-native data pipelines using Python on GCP Manage large-scale object storage for unstructured data (Google Cloud Storage preferred) Manage large-scale object storage for unstructured data (Google Cloud Storage preferred) Support PyTorch-based workflows, particularly around data loading and dataset management in the cloud Support PyTorch-based workflows, particularly around data loading and dataset management in the cloud Build and optimise data integrations with BigQuery and SQL databases Build and optimise data integrations with BigQuery and SQL databases Ensure efficient memory usage and performance when handling large datasets Ensure efficient memory usage and performance when handling large datasets Collaborate with ML engineers to support training and experimentation pipelines (without owning model development) Collaborate with ML engineers to support training and experimentation pipelines (without owning model development) Implement monitoring, testing, and documentation to ensure production-grade reliability Implement monitoring, testing, and documentation to ensure production-grade reliability Participate in agile ceremonies, code reviews, and technical design discussions Participate in agile ceremonies, code reviews, and technical design discussions Strong Python development experience Strong Python development experience Hands-on experience with cloud object storage for unstructured data(Google Cloud Storage preferred; AWS S3 also acceptable) Hands-on experience with cloud object storage for unstructured data(Google Cloud Storage preferred; AWS S3 also acceptable) PyTorch experience, particularly:Dataset managementData loading pipelinesRunning PyTorch workloads in cloud environmentsWe are not looking for years of PyTorch experience - one or two substantial 6-12 month projects is ideal PyTorch experience, particularly: Dataset management Dataset management Data loading pipelines Data loading pipelines Running PyTorch workloads in cloud environmentsWe are not looking for years of PyTorch experience - one or two substantial 6-12 month projects is ideal Running PyTorch workloads in cloud environmentsWe are not looking for years of PyTorch experience - one or two substantial 6-12 month projects is ideal 5+ years cloud experience, ideally working with large numbers of files in cloud buckets 5+ years cloud experience, ideally working with large numbers of files in cloud buckets Experience with additional GCP services, such as:Cloud RunCloud SQLCloud Scheduler Experience with additional GCP services, such as: Cloud Scheduler Cloud Scheduler Exposure to machine learning workflows (not ML engineering) Exposure to machine learning workflows (not ML engineering) Some pharma or life sciences experience, or a genuine interest in working with domain-specific scientific data Some pharma or life sciences experience, or a genuine interest in working with domain-specific scientific data Please send your CV