Course SummaryProspective Students
Course Summary
(3 credits) This course provides both the theoretical foundations and the practical applications of data mining, covering methods such as classification, clustering, association analysis, dimension reduction, and anomaly detection, with emphasis on interpreting results and applying them to real-world datasets.
Overview
This course will introduce popular data mining methods for extracting knowledge from data. We will discuss the principles of data mining methods that students will apply to develop data mining solutions for scientific and business problems. Topics and related methods discussed in this course include: data preprocessing, association mining, classification and prediction, cluster analysis, and mining complex data types. Readings will consist of book chapters or articles relevant to each topic discussed.
Course Objectives
Students will learn to:
- Understand basic data mining techniques, how to apply them, and when they are applicable.
- Be able to utilize a data mining package.
- Be able to apply data mining techniques to solve problems.
Course Materials
- There is no textbook required for this course. Each lesson will assign readings in electronic format. Each reading will be available through the Course Schedule, Library E-Reserves, and/or in the Canvas course content.
Required Software
- This course will require the most recent version of the Cursor Software.
Proctored Exams
There will be two proctored exams, using the Honorlock software for this course. Each exam will open seven days before it is due. You will be given 120 minutes once you begin, so be sure to give yourself enough uninterrupted time to complete the exam. The last exam will be due on the last official scheduled day of class by 11:59 PM Eastern Time. No make-up exams will be given, except in cases of emergencies and/or with prior approval from the instructor. Any questions on exams should be directed to the instructor.
Grading and Examinations
Assignment | Quantity | Percentage of Final Grade |
Coding Exercises | 4 | 20% |
Discussions | 4 | 10% |
Quizzes | 2 | 4% |
Midterm Exam | 1 | 15% |
Final Exam | 1 | 15% |
Project | multiple parts | 36% |
*Student work will be graded according to the following grading scheme:
A = 93-100, A- = 90-92, B+ 87-89, B = 84-86, B- = 80-83, C+ = 77-79, C = 70-76, D = 60-69, F = Below 60
Coding Exercises
Students will complete four individual coding exercises throughout the semester. Each exercise is designed to reinforce key data mining techniques such as preprocessing, visualization, clustering, dimension reduction, and supervised learning. Assignments require both code and a separate written report to promote both technical proficiency and interpretive clarity. AI-assisted tools like Cursor are encouraged, but must be used with comprehension. These exercises serve as hands-on practice to bridge theory and real-world data analysis.
Please see the Course Schedule for the specific due dates of each assignment.
Class Participation (Discussion Forums)
There will be discussion boards for students to discuss different aspects of the course. The instructor will participate in the discussions when it is appropriate. Use the discussion board to post your questions and to read the responses from your classmates.
There will be four discussions over the term of this class.
- Your responses to discussion questions should be submitted by 11:59 PM ET on Sunday of the lesson week.
- You are expected to respond to at least two of the initial discussion questions.
- Every student is expected to provide at least two replies to their peers’ responses by the discussion due date/time noted in the Course Schedule. This will be the Sunday after the initial response is due.
Team Project
Over the course of the semester, students will work in teams to complete a comprehensive data mining project. The goal is to apply the concepts and techniques covered in class—from data preprocessing and visualization to pattern mining, clustering, and predictive modeling—on a real-world dataset of the team’s choosing. Teams will submit milestone reports throughout the semester and conclude with a final report that synthesizes their findings. Projects will be evaluated based on originality, analytical rigor, and clarity of communication. This is a collaborative, cumulative project designed to simulate real-world data science workflows.
Exams
There will be two exams. Each exam will open seven days before it is due. You will be given 120 minutes once you begin, so be sure to give yourself enough uninterrupted time to complete the exam. The last exam will be due on the last official scheduled day of class by 11:59 PM Eastern Time. No make-up exams will be given, except in cases of emergencies and/or with prior approval from the instructor. Any questions on exams should be directed to the instructor.
Overview of Course Topics
- Introduction to Data Mining
- Getting Started with Python and Cursor
- Data Preprocessing
- Summary Statistics and Visualization
- Frequent Pattern Mining
- Clustering
- Dimension Reduction
- Outlier Detection
- Supervised Learning
Prospective Students
For more information on our programs, please check out our Great Valley Program website.