Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Case Study 5: Ensuring Data Integrity through Rigorous Cleansing Before Integration
Project Type
Data Analysis
Date
December 2023
Problem
As part of a broader initiative to integrate datasets from two overlapping regions, Couchiching (COU) and Orillia (OLA), a utility encountered significant discrepancies in meter data attributes. These inconsistencies needed to be addressed to ensure data integrity for effective analysis and integration. The utility required a detailed day-by-day, month-by-month review of meter read units (MRUs) for 2023 and 2024 to prepare the datasets for integration into Power BI.
Solution
To tackle this challenge, our team implemented a structured approach to cleanse the datasets for each region separately. This process was critical to maintaining the accuracy and consistency of the data before any integration could occur. We established a procedure where each region’s data was first aligned and formatted according to a standardized Mass Deploy master dataset. This included sorting by MRU and adding a next meter read date (next_MRD) column to streamline subsequent data merging.
We also conducted a detailed review of device locations within the datasets, particularly identifying and labeling any MRUs lacking precise device location data before they were imported into Power BI. To further ensure data quality, we compared latitude and longitude data for the OLA region using KMZ files in Google Earth, which allowed us to directly compare the geographic data provided by different sources (PEC vs. PIQ).
Result
The rigorous data cleansing process enabled the creation of precise and informative visualizations for each region, handled separately to maintain clarity and focus. These visuals were generated in Power BI and displayed MRU counts by day, month, and year, with an advanced filtering applied to highlight counts greater than or equal to 20 MRUs.
Each visualization provided a clear, detailed view of the data, organized chronologically and segmented by specific areas within each region, as indicated by the first eight letters of the region code. These visuals not only showcased the total meter reads required per day but also highlighted which MRU groups within the regions were most active, using color coding to distinguish between them.
The final presentations of these visuals for October 2023 through August 2024 were compiled into an Excel file with clear labels and screenshots, prepared for executive review. This approach not only facilitated a more informed discussion among utility executives but also ensured that the data integration phase could proceed with confidence in the accuracy and completeness of the datasets.










