The objective of this project is to use Python and Docker to create and train a machine learning model, simulate new incoming data, detect drift, and adapt the model to that drift.

Sponsor


 

Team Members

Venkata Sai Renusree Bandaru    William Delaney    Harsh Gupta    Christopher Pesta    Haobo Yang    Kevin Zhu                  

  

Project Poster

Click on any image to enlarge.


Project Video

video player icon

 

Project Summary

Overview

Models and algorithms can derive information from data to support decision-making. The prediction accuracy of the model might decay over time due to concept drift. Concept drift is the change in statistical properties of a target variable. When concept drift exists, the machine needs to automatically detect and adapt to it.

Objectives

The objectives of the project are to use Python and Docker to create and train a machine learning model, simulate new incoming data, detect drift, and adapt the model to that drift.

Approach

-The Zoo Data Set from UCI Machine Learning Repository was chosen to create and train our machine learning model.
-The dataset contains 101 animals that are classified into 7 different classes by their attributes.

-GitHub is used for version control.

-Decision tree was implemented in Python as our machine learning model.

-Concept drift was introduced to the original dataset by modifying attributes of classes.
-For example, some fish now produces milk, and they would be erroneously classified as mammals.

-Concept drift is detected if accuracy of our model falls below 85%.

-Adapt our model by retraining the whole decision tree based on the concatenation of the original and new datasets when there is concept drift.

-All project files are uploaded to Docker to make sure it runs on every machine regardless of environment.

Outcomes

-The model has 91% prediction accuracy on datasets without concept drift.

-Induced dataset with concept drift lowered our model’s accuracy down to 73%.

-After retraining the model, it reaches 97% accuracy.