"Production ETL systems supporting live operational teams at scale"
Four years of production data engineering inside a large healthcare organisation — replacing fragile spreadsheet processes with automated ETL pipelines, governed reporting, and compliance systems that save time, reduce error, and give operational teams far better visibility.
This work was delivered inside a large UK healthcare organisation serving a population of over 750,000. Within this environment, the Learning & Development team was responsible for ensuring that tens of thousands of staff maintain compliance with mandatory training — safeguarding, infection control, fire safety, and dozens more.
My role — officially titled Learning Experience Specialist — evolved into something much closer to a data engineer because the operational need was obvious. Reporting depended on brittle spreadsheets, manual copying between systems, and formulas that broke as soon as staffing structures changed. The data existed, but turning it into something trustworthy was slow, repetitive, and high risk.
Over four years, I systematically replaced those weak points with Python ETL pipelines, PostgreSQL-backed reporting layers, Power BI dashboards, and internal tools that gave the team a far more reliable view of training compliance across the organisation.
The capstone project of my Level 4 Data Analyst apprenticeship — and the highest-impact system I built. Safeguarding compliance is tightly regulated in healthcare settings; every staff member must complete specific training based on their role, patient contact level, and department.
Previously, generating a safeguarding compliance report for a single directorate took multiple hours of manual spreadsheet work. The dashboard replaced this with a real-time Power BI interface backed by an automated data pipeline.
Multi-page interactive dashboard with drill-through from organisation-level overview to individual staff compliance. Role-based access ensures managers only see their own teams.
Scheduled Python ETL that extracts from the ESR (Electronic Staff Record) system, transforms role mappings, and loads into the reporting layer. Runs daily without manual intervention.
This project earned a Merit distinction for the Work-Based Project component of the Level 4 Data Analyst apprenticeship — the assessor specifically highlighted the dashboard's real-world impact and the rigour of the data pipeline design.
The backbone of everything else: a suite of Python jobs that extract data from ESR and the LMS (Kallidus), transform it into clean reporting models, validate it against business rules, and load it into reporting databases and Power BI datasets.
Across the wider reporting estate, that means 850,000+ records processed in daily operational pipelines — staff records, training completions, assignment mappings, and compliance calculations. They replaced a fragile web of Power Query connections and manual Excel manipulation that broke regularly and offered little auditability.
Challenge: The team depended on existing Power Query data flows for daily reporting. A migration to Python couldn't break existing dashboards or introduce data inconsistencies during the transition.
Solution: Ran both systems in parallel for two months, comparing outputs daily. Built automated reconciliation scripts that flagged any discrepancies between the Power Query and Python outputs. Migrated one data source at a time, validating each before decommissioning the old flow.
The organisation works with affiliated organisations that need access to specific data files — training records, compliance reports, staff lists. Managing which affiliates had access to which files was handled through a confusing shared drive structure with inconsistent permissions.
Built a Flask web application that provides a clean interface for uploading, categorising, and distributing files to specific affiliates. Role-based access ensures each affiliate only sees their own files, and an audit log tracks every download.
This is not simulated portfolio work — it's 4+ years of production delivery inside a live operational environment. The systems run daily, serve real stakeholders, and demonstrate that I can build reliable data infrastructure where accuracy, trust, and operational continuity genuinely matter.
Designing, building, and maintaining data pipelines that process hundreds of thousands of records daily with zero manual intervention. Error handling, logging, and recovery built in.
Translating complex compliance requirements into clear, actionable Power BI dashboards. Understanding what managers actually need to see — not just what's technically possible.
Production-grade Python for data processing — pandas, scheduled tasks, API integration, file handling, email automation, and database operations at scale.
PostgreSQL schema design, query optimisation, and data modelling for reporting workloads. Understanding how to structure data for both operational and analytical use.
Automated SFTP download and upload pipelines that run daily without manual intervention. File selection by timestamp, footer stripping, validation, and stage-swap deployment patterns.
Working within a large organisation means translating technical solutions into language that managers, operational leads, and non-technical staff can understand and trust. Every system I built needed buy-in before it could go live.