In order to better optimize the data received from these tests, we have decided to employ database organizational tools such as Python libraries and dashboard mediums such as Grafana and PowerBi in order to succinctly and effectively display the received field data.

Sponsor


 

Team Members

William Ma    Ali Bassiouni    Anthony Bigler    Michael Boliek    Dharani Chowdary    Yiyang Wang    Tejas Desale               

  

Project Poster

Click on any image to enlarge.


Project Video

video player icon

 

Project Summary

Overview

Volvo Group Trucks routinely receives software data garnered from field tests. In order to better optimize the data received from these tests, we have decided to employ database organizational tools such as Python libraries and dashboard mediums such as Grafana and PowerBi in order to succinctly and effectively display the received field data. Data is organized based on uniqueness, completeness, validity, and accuracy.

Objectives

To analyze the data, we wrote a Python library that inspects the data for uniqueness, completeness, validity, and accuracy. The test data provided to us by the sponsor has a very limited set of data points compared to the real data that the engineers at Volvo will work with, so we need to ensure the library is able to work with data sets that are significantly larger than what we will test it with. After data analysis, the test data was put into a database that the dashboard can query. The dashboard needs to display date, truck name, project name, and channel name using bar charts, time series, and pie charts.

Approach

Data Measurement Functions
Completeness
Accuracy
Validity
Uniqueness

Outcomes

The data quality metric will be included in the standard Power Bi dashboard to provide end-users an overview of the data quality of the project.
The Data Quality Library would be integrated into the current library at Volvo Trucks. This will help validate the data quality to make sure that the data being viewed had good data quality.
The completed functions will help improve an understanding of events triggering poor data quality and improve overall data quality over time.