top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Case Study 5: Ensuring Data Integrity through Rigorous Cleansing Before Integration

Project Type

Data Analysis

Date

December 2023

Problem

As part of a broader initiative to integrate datasets from two overlapping regions, Couchiching (COU) and Orillia (OLA), a utility encountered significant discrepancies in meter data attributes. These inconsistencies needed to be addressed to ensure data integrity for effective analysis and integration. The utility required a detailed day-by-day, month-by-month review of meter read units (MRUs) for 2023 and 2024 to prepare the datasets for integration into Power BI.

Solution

To tackle this challenge, our team implemented a structured approach to cleanse the datasets for each region separately. This process was critical to maintaining the accuracy and consistency of the data before any integration could occur. We established a procedure where each region’s data was first aligned and formatted according to a standardized Mass Deploy master dataset. This included sorting by MRU and adding a next meter read date (next_MRD) column to streamline subsequent data merging.

We also conducted a detailed review of device locations within the datasets, particularly identifying and labeling any MRUs lacking precise device location data before they were imported into Power BI. To further ensure data quality, we compared latitude and longitude data for the OLA region using KMZ files in Google Earth, which allowed us to directly compare the geographic data provided by different sources (PEC vs. PIQ).

Result

The rigorous data cleansing process enabled the creation of precise and informative visualizations for each region, handled separately to maintain clarity and focus. These visuals were generated in Power BI and displayed MRU counts by day, month, and year, with an advanced filtering applied to highlight counts greater than or equal to 20 MRUs.

Each visualization provided a clear, detailed view of the data, organized chronologically and segmented by specific areas within each region, as indicated by the first eight letters of the region code. These visuals not only showcased the total meter reads required per day but also highlighted which MRU groups within the regions were most active, using color coding to distinguish between them.

The final presentations of these visuals for October 2023 through August 2024 were compiled into an Excel file with clear labels and screenshots, prepared for executive review. This approach not only facilitated a more informed discussion among utility executives but also ensured that the data integration phase could proceed with confidence in the accuracy and completeness of the datasets.

© 2022 ObjectHouse

  • Black Facebook Icon
  • Black Twitter Icon
  • Black LinkedIn Icon
bottom of page