Azure and Databricks End-to-End Pipeline
Description
Budget: $250 - $750
I want a complete, production-style data engineering project built on Azure and Databricks that I can showcase as a real-world reference preferably in healthcare or retail domain. The solution should ingest at-scale public data sets (think TPC transaction benchmarks for retail, Synthea synthetic EHR data for healthcare, or a clever blend of both) and drive two to three meaningful business use cases from raw landing all the way to presentation. I am purposely leaving the exact scenarios open so you can propose what will demonstrate the most insight, but they must be substantial enough to feel like something a modern data team would support in production.
Data quality is my top priority. I need lineage, validation rules, automated tests, and observable metrics baked in from day one—Great Expectations, Delta Live Tables expectations, or comparable frameworks are welcome, as long as quality gates are visible in the monitoring layer.
Scope to cover: • Architecture design diagram with clear component rationale (Azure Data Lake, Databricks, Delta, Unity Catalog, etc.). • Reproducible code (Python / PySpark, notebooks or repos) with CI/CD instructions. • Ingestion pipelines (batch or streaming), curated layers, and serving tier (SQL endpoints, Power BI, or dashboards of your choice). • Integrated monitoring, alerting, and cost-aware observability using native Azure tools or open-source add-ons. • End-to-end test suite: unit, integration, and data quality tests triggered via pipelines. • Comprehensive markdown that walks through setup, architecture, and business logic.
Acceptance criteria:
- Pipelines run on my Azure subscription with a simple deploy script.
- Data quality reports surface failed expectations in a dashboard or log analytics workspace.
- Each business use case produces a consumable output (table, visualization, or API) that confirms value.
Skills
Want AI to find more roles like this?
Upload your CV once. Get matched to relevant assignments automatically.