CompanyRemote

Clinical Trial Data Architecture Build

Project-Based

Description

I am putting together an end-to-end data architecture that can reliably ingest, store, and serve a broad range of clinical-trial assets: patient demographics, clinical trial results, genomic data, COA (Clinical Outcomes Assessment) records, and a growing rater database.

What I need from you Design the target architecture and implement the core pipelines—ideally using a modern cloud stack (Snowflake, Databricks, BigQuery, Redshift, or a similar platform; feel free to propose the best fit). Your work should cover raw-to-curated layers, automated metadata capture, and role-based access controls that satisfy typical GxP and HIPAA expectations.

Key deliverables • Reference architecture diagram with component rationale • Re-usable ingestion and transformation code (Python, SQL, or Spark) for each data domain listed above • A unified analytical schema / data model ready for downstream BI, ML, and statistical analysis • Brief runbook plus inline documentation so an internal team can extend or troubleshoot the solution

Acceptance criteria The pipelines must load a small sample (I will supply CSV/JSON/VCF files) end-to-end, land the data in the curated layer with provenance preserved, and let me query it in under five minutes. All code should be version-controlled and container-ready.

If you have direct experience deg data platforms for clinical research—or have handled similarly sensitive data sets—this should be a quick but impactful engagement. Looking forward to seeing how you’d approach it. Budget: GBP 250–750 Skills: Python, Cloud Computing, Hadoop, PostgreSQL, Elasticsearch, Redshift, Data Architecture, BigQuery

Skills

PythonDatabricksSparkSQLMachine LearningMLHadoopBigQueryApache SparkPostgreSQLRedshiftElasticsearchCloud ComputingData ArchitectureSnowflake

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching