Basic Information
Class Location: Westgate Bldg E208
Class Time: Tue/Thu, 10:35 AM – 11:50 AM
Instructor: Lu Lin
- Contact: lulin[at]psu.edu
- Office hours: Wed, 2:00 PM – 3:00 PM or By Appointment
- Office: Westgate Bldg E373
TA: Tianrong Zhang
- Contact: tbz5156[at]psu.edu
- Office hours: Fri, 1:00 PM – 2:00 PM
- Office: Westgate Bldg E301
Course Overview
Objective: The course will cover a broad topics in data mining including machine learning foundations (regression, classification and clustering), and recent trends in computer vision (image/video data mining), natural language processing (text mining) and graph learning (structured data mining). This course is designed for graduate students who are interested in using machine learning techniques to discover patterns and gain knowledge about data.
Prerequisites: Students are expected to have programming background either in C, Java, Python (recommended) or other programming language to do course projects. However, the course will not require the students to program things from scratch: Python has a lot of machine learning libraries, which already realizes many models and is very convenient to use with just importing the libraries and calling functions. Meanwhile, sample code about how to call a model will be provided, when introducing the model details in class. Students are also expected to have math background in linear algebra and probability to understand the machine learning principles.
Course Material:
- Deep Learning (by Ian Goodfellow, Yoshua Bengio and Aaron Courville)
- Pattern Recognition and Machine Learning (by Christopher Bishop)
Tentative Schedule and Readings
Slides will be posted before each class.
Week | Date | Lectures |
---|---|---|
1 | 08/22 |
Introduction
|
08/24 |
Review of Linear Algebra and Probability I
|
|
Team sign up for paper presentation [form] Due Friday 09/01, 11:59pm (ET) |
||
2 | 08/29 |
Review of Linear Algebra and Probability II
|
08/31 |
Data Preprocessing and Representation |
|
09/01: Paper presentation team sign up due |
||
Group project team sign up [form] and proposal [template] Due Friday 10/06, 11:59pm (ET) |
||
3 | 09/05 |
Machine Learning Foundations I: Linear Regression
|
09/07 |
Machine Learning Foundations II: Linear Classification
|
|
Individual Project I on Heart Attack Prediction Due Friday 09/22, 11:59pm (ET) |
||
4 | 09/12 |
Machine Learning Foundations III: Perceptron and Evaluation |
09/14 |
Machine Learning Foundations IV: Naive Bayes
|
|
5 | 09/19 |
Machine Learning Foundations V: Decision Tree |
09/21 |
Machine Learning Foundations VI: Ensemble Method
|
|
09/22: Individual Project I due |
||
6 | 09/26 | class canceled |
09/28 |
Machine Learning Foundations VII: Clustering
|
|
7 | 10/03 |
Machine Learning Foundations VIII: Clustering |
10/05 |
Machine Learning Foundations VIIII: Review and Application In-class quiz on Machine Learning |
|
10/06: Group project proposal due |
||
8 | 10/10 |
Multi-layer Perceptrons and Back-propagation
|
10/12 |
Image Mining I: Convolutional Neural Networks
|
|
Individual Project II on Image Classification Due Friday 10/27, 11:59pm (ET) |
||
9 | 10/17 |
Image Mining II: Review and Application In-class quiz on Computer Vision |
10/19 |
Text Mining I: Word Embedding
|
|
10 | 10/24 |
Text Mining II: Language Model
|
10/26 |
Text Mining III: Recurrent Neural Networks
|
|
10/27: Individual Project II due |
||
11 | 10/31 |
Text Mining IV: Transformer
|
11/02 |
Text Mining V: Review and Application In-class quiz on Text Mining |
|
Individual Project III on Text Classification Due Mon 11/27, 11:59pm (ET) |
||
12 | 11/07 |
Graph Mining I: Node Embedding
|
11/09 |
Graph Mining II: Graph Neural Networks
|
|
13 | 11/14 |
Graph Mining III: Review and Application In-class quiz on Graph Mining |
11/16 |
Lab: Project Discussion |
|
14 | 11/21 | Thanksgiving, No Class |
11/23 | Thanksgiving, No Class | |
11/27: Individual Project III due |
||
15 | 11/28 |
Advance Topic I: Interpretability |
11/30 |
Advance Topic II: Robustness |
|
16 | 12/05 |
Advance Topic III: Fairness |
12/07 |
Group Project Expo (Lightning talk + Demo) |
|
12/08: Group project report due |
Grading
- Paper Presentation (10 points)
- 10-min presentation about the chosen paper
- In-class Quiz (10 points) – 2.5 points * 4
- Individual Project (60 points) – 20 points * 3
- Predefined data mining problem on Kaggle
- Each project is graded based on the evaluation metric on Kaggle and the quality of report
- Top-ranked teams will be awarded bonus points
- Group Project (20 points)
- Exploratory data mining problem defined by you
- The project is graded based on the quality of proposal (5 points) and final report (15 points)
- In the last class of project expo, selective teams will present their work (10-min lightling talk) and be awarded bonus points, and the rest time will be a workshop among all the teams
- Bonus points can be earned
- Cutoff:
- A: [93, 100]
- A-: [90, 93)
- B+: [87, 90)
- B: [83, 87)
- B-: [80, 83)
- C+: [77, 80)
- C: [70, 77)
- D: [60, 70)
- F: [0, 60)
The instructor reserves the right to curve the grade so as to improve the letter grade if warranted by unpredictable circumstances (i.e., assignment too difficult).
Grading criteria for paper presentation
In the beginning of each lecture from 09/26, there will be a paper reading session. Students are required to form a team (of 1-3 members), select one paper from the list (or propose other choices with instructor’s approval), and prepare a 10-min presentation for the class, with a maximum 5-min Q&A. So in total, the session is 15-min. Students are required to prepare the slides by themselves (the original authors’ slides are not allowed to be used for this presentation). Presenters must present the selected/assigned paper on the scheduled date. No extension will be given due to the tight schedule of this course. The purpose of this paper presentation is to help students to practice giving talks in front of public at conferences or other situations.
Both the instructor and other students will grade the presentation (but no self-grading). The detailed grading criteria are as follows. In total it has 50 pts, and counts for 10 pts in the final grade.
Aspect | Score range |
---|---|
Slides quality — Slides content was clearly visible and self-explainable | [1, 5] |
Idea delivery — Important messages of the paper were properly highlighted | [1, 5] |
Organization — structure and logic of the presentation were well organized | [1, 5] |
Clarity — Explained approaches/methods clearly | [1, 5] |
Pace — Moderate pace for the audience to follow | [1, 5] |
Engagement — Presenter(s) did not just read off of the slides | [1, 5] |
Team Work — All students in the team well understood the paper | [1, 5] |
Timing — Perfect timing | [1, 5] |
Q&A — Responded to audience’s questions well | [1, 5] |
Inspiration — I have learned something and was inspired by this presentation, and would like to read the paper in future | [1, 5] |
Grading criteria for group project
The purpose of course project is to give students hands-on experience on solving some novel data mining problems. The project thus emphasizes either research-oriented problems or “deliverables.” It is preferred that the outcome of your project could be publishable, or tangible, typically some kind of novel research problem or prototype system that can be demonstrated. Group work is strongly encouraged, and each team can have 2-3 members. The group project topics are flexible:
- Your own research projects that are related to data mining, which preferably present a good integration of data mining techniques;
- You can define and solve a data mining problem in a specific application, which has some novel challenge to be tackled;
- You could explore and identify interesting weakness/failure/behavior of trending techniques (e.g., ChatGPT, diffusion model), reason why and provide possible solutions you will try based on open-source models;
- You could do literature survey, but please be advised that this needs to be up-to-date and novel (i.e., it should not be similar to existing survey papers). A good survey paper is also expected to have a good coverage of the following: summarization and reflection of existing works, your own understanding about pros/cons of existing works, unique challenges, your proposed methods, and preliminary results to support the motivation/design, and what are the future directions.
The grade consists of two major parts: proposal report (50 pts) and final report (150 pts), which in total counts for 20 pts in the final grade. The detailed grading criteria are as follows. Three teams will be selected to do a 10-min lightning talk in the last lecture, with bonus points applied.
Proposal report grading criteria:
Aspect | Score range |
---|---|
Strictly follow the provided template and page limit | [0, 10] |
Background and studied problem were clearly stated in the introduction | [0, 10] |
Sufficient discussion of state-of-the-art in related work section | [0, 10] |
The proposed solution is reasonable and not too trivial | [0, 10] |
Detailed and reasonable schedule for deliverables | [0, 10] |
Final project report grading criteria:
Aspect | Score range |
---|---|
Strictly follow the provided template and page limit | [0, 10] |
Background, studied problem and motivation were clearly stated in the introduction, and the logic and argument were reasonable | [0, 15] |
Contribution of the work was properly articulated in the introduction | [0, 15] |
Sufficient discussion of state-of-the-art and how this work differentiates from existing works in related work section | [0, 15] |
Description of the proposed method was clear, comprehensive, coherent and consistent with the claim in the introduction | [0, 35] |
Clear and precise description of evaluation design and dataset | [0, 10] |
Thorough evaluation of the proposed method and detailed analysis of the results | [0, 35] |
Summarization of the work, reasonable discussion of limitation of the proposed solution and possible future work | [0, 15] |
Assignment Submission Policy
- Assignments must be TYPED and dropped to proper CANVAS drop boxes
- Students can submit late with the penalty of 25% deduction for every 12 hours late (up to 2 days)
- After 2 days, no more late submission is allowed
- All deadlines will be Friday midnight
Academic Integrity
According to the Penn State Principles and University Code of Conduct: Academic integrity is a basic guiding principle for all academic activity at Penn State University, allowing the pursuit of scholarly activity in an open, honest, and responsible manner. In accordance with the University’s Code of Conduct, you must not engage in or tolerate academic dishonesty. This includes, but is not limited to cheating, plagiarism, fabrication of information or citations, facilitating acts of academic dishonesty by others, unauthorized possession of examinations, submitting work of another person, or work previously used without informing the instructor, or tampering with the academic work of other students. Any violation of academic integrity will be investigated, and where warranted, punitive action will be taken. For every incident when a penalty of any kind is assessed, a report must be filed.
Plagiarism (Cheating): Talking over your ideas and getting comments on your writing from friends are NOT examples of plagiarism. Taking someone else’s words (published or not) and calling them your own IS plagiarism. Plagiarism has dire consequences, including flunking the paper in question, flunking the course, and university disciplinary action, depending on the circumstances of the office. The simplest way to avoid plagiarism is to document the sources of your information carefully.
Projects: When discussing projects and paper presentations, you may:
- Discuss the material presented in class or included in assigned readings, documentation, user manual, etc.
- Assist another student in understanding the statement of the problem (e.g., you may assist a non-native speaker by translating some English phrases unfamiliar to that student)
- Discuss high-level ideas about how to complete the lab assignment, including problem specification, general strategies for the solution, strategies for debugging and testing code, etc. without examining code written by other students, or sharing code written by you with other students.
It is expected that you have independently arrived at solutions that you turn in for laboratory assignments. The following are examples of activities that are PROHIBITED:
- Examining, copying of code or code fragments from someone else (including online sources), other than the code that is provided to you by the instructor or included in the reference books.
- Sharing code or code fragments (via email, discussion groups, social media, whiteboard, handwritten or printed copies, etc.)
! Warning
- Violation of Academic Integrity policy will result in an automatic F for the concerning submission.
- Two violations ⇒ fail grade in the course
Student Disability
Americans with Disabilities Act: The School of Information Sciences and Technology welcomes persons with disabilities to all of its classes, programs, and events. If you need accommodations or have questions about access to buildings where IST activities are held, please contact us in advance of your participation or visit. If you need assistance during a class, program, or event, please contact the member of our staff or faculty in charge. Access to IST courses should be arranged by contacting the Office of Human Resources, 332 IST Building: (814) 865-8949.
Students with Disabilities: It is Penn State’s policy to not discriminate against qualified students with documented disabilities in its educational programs. (You may refer to the Nondiscrimination Policy in the Student Guide to University Policies and Rules.) If you have a disability-related need for reasonable academic adjustments in this course, contact the Office for Disability Services (ODS) at 814-863-1807 (V/TTY). For further information regarding ODS, please visit the Office for Disability Services Web site at http://equity.psu.edu/ods/.
In order to receive consideration for course accommodations, you must contact ODS and provide documentation (see documentation guidelines at http://equity.psu.edu/ods/guidelines/documentation-guidelines). If the documentation supports the need for academic adjustments, ODS will provide a letter identifying appropriate academic adjustments. Please share this letter and discuss the adjustments with your instructor as early in the course as possible. You must contact ODS and request academic adjustment letters at the beginning of each semester.