Handling Active Status in the Data Warehouse ETL

Problem Statement

Our data warehouse currently does not include all employees due to licensing limitations in the transactional system. Leadership, senior leadership, and operations staff are sometimes excluded. However, another system (a PowerApp) provides comprehensive employment history, including start and end dates for all staff, even those who leave and return.

We face a critical decision: how and when to integrate this data to ensure accurate, consistent reporting and avoid issues such as:

  • Sending notifications to former employees
  • Excluding key leadership from reports
  • Creating confusion or distress among users not marked as active

Options and Evaluation

Option 1 (Recommended)

Option 2

Option 3

Maintain Active Status Early in ETL

Omit or ignore Non-Transactional Employees

Defer Activeness Logic to Reports or Late ETL

Integrate PowerApp data early in the ETL pipeline to create a unified employee dimension, determining “Active” status centrally.

Exclude employees not present in the transactional system; possibly flag as disabled if present.

Load all employee data, but determine active/inactive status in views or reports.

Pros:

  • Provides a complete and unified view of all employees
  • Ensures consistent logic for “Active” status across all reporting
  • Reduces risk of accidentally contacting former employees
  • Simplifies reporting and improves user trust

  • Scalable and maintainable within current ADF/T-SQL architecture

Pros:

  • Simplifies ETL by only using one source
  • Minimizes immediate development effort

Pros:

  • Offers flexibility in defining “Active” in different contexts
  • Minimizes changes to early-stage ETL logic

Cons:

  • Requires initial development effort to merge sources and calculate status
  • Adds slight ETL complexity, especially around data dependencies

Cons:

  • Produces incomplete employee data
  • Confuses users whose data is missing
  • Leads to inconsistent and potentially erroneous reporting
  • Cannot support enterprise-wide headcount or turnover metrics

Cons:

  • Requires more technical discipline across teams
  • Increases complexity in reports and BI models
  • Encourages inconsistent implementations across tools
  • Difficult to enforce business rules centrally

Proposal

Option 1 (Maintain Active Status Early in ETL) approach ensures:

  • A consistent and trustworthy view of all employees
  • Simplified, accurate, and efficient reporting and notifications
  • Minimal risk of excluding or misidentifying team members

This solution involves integrating the PowerApp data feed, updating the Employee dimension with employment history and status, and establishing a reliable process to maintain this data daily via Azure Data Factory and T-SQL.

This is a forward-looking solution that balances accuracy, user experience, and maintainability, supporting both current needs and future scalability.

Sources

  • SoftwareOne – Master data management in a data warehouse context (combining multiple systems’ employee records into one master set).
  • ClearPeaks – Explanation of SCD Type-2 with an Active flag for current record identification.
  • Reddit (r/dataengineering) – Advice referencing Kimball’s approach to add an “is active” flag in dimension tables for the current version of a record.
  • Stack Overflow Q&A – Discussion on doing calculations in ETL vs in the database; recommendation to perform transformations (like computing elapsed times or flags) as part of the ETL/ELT using SQL (for consistency and performance) .
  • DBA Stack Exchange – Argument in favor of storing business logic in the DW to avoid duplicating it in many reports and to ensure one consistent definition (example of lead qualification logic, analogous to our active employee rule).
  • LinkedIn Article – “Never Use Reports to Manage Business Logic,” stressing that the data platform should handle logic, which makes reports simpler and data more consistent.