Clinical Trial Data Architecture Build
Description
I am putting together an end-to-end data architecture that can reliably ingest, store, and serve a broad range of clinical-trial assets: patient demographics, clinical trial results, genomic data, COA (Clinical Outcomes Assessment) records, and a growing rater database.
What I need from you Design the target architecture and implement the core pipelines—ideally using a modern cloud stack (Snowflake, Databricks, BigQuery, Redshift, or a similar platform; feel free to propose the best fit). Your work should cover raw-to-curated layers, automated metadata capture, and role-based access controls that satisfy typical GxP and HIPAA expectations.
Key deliverables • Reference architecture diagram with component rationale • Re-usable ingestion and transformation code (Python, SQL, or Spark) for each data domain listed above • A unified analytical schema / data model ready for downstream BI, ML, and statistical analysis • Brief runbook plus inline documentation so an internal team can extend or troubleshoot the solution
Acceptance criteria The pipelines must load a small sample (I will supply CSV/JSON/VCF files) end-to-end, land the data in the curated layer with provenance preserved, and let me query it in under five minutes. All code should be version-controlled and container-ready.
If you have direct experience deg data platforms for clinical research—or have handled similarly sensitive data sets—this should be a quick but impactful engagement. Looking forward to seeing how you’d approach it. Budget: GBP 250–750 Skills: Python, Cloud Computing, Hadoop, PostgreSQL, Elasticsearch, Redshift, Data Architecture, BigQuery
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.