Course SummaryProspective Students
Course Summary
(3 credits) Examination of large-scale data storage technologies including NoSQL database systems for loosely structured data, and warehouses for dimensional data.
Prerequisites:
- INSC 521
Overview
This course provides a broad exploration of current and emerging practices for handling large quantities of data using large-scale database systems. Data is being generated at an exponential rate and handling and analyzing such data needs highly customized tools and processes to handle data-intensive tasks. In particular, this course investigates methods to effectively design, develop, and implement the two dominant types of large-scale databases: data warehouses for dimensional data and NoSQL databases for loosely structured data. Students will learn to design a wide variety of large database solutions, apply extract-transform-load (ETL) strategies, maintain and evolve large-scale databases, explore the fundamentals of NoSQL systems, and understand the properties of different database technologies.
Course Objectives
The objective of this course is to introduce the students to various issues related to managing large-scale databases for successfully storing and retrieving business-related knowledge. Several systems for managing large-scale data will be demonstrated. The students will:
- Develop an understanding of technologies used to develop, optimize, and deploy large databases.
- Learn how large-scale data management systems work for storing, organizing, and querying large amounts of data.
- Critically assess properties of high-performance database architectures.
- Explore the fundamentals of NoSQL systems and big data analytics.
- Integrate heterogeneous data into a single large-scale database.
- Apply the acquired knowledge to business requirements and customize the database systems to business needs through hands-on projects.
Course Materials
Required Textbooks
There is no textbook required for this course.
- Each lesson will assign readings in electronic format.
- Each reading will be available through the Course Schedule and/or Library eReserves.
Required Software
The most recent version of Docker with WSL 2 backend software which can be run on every Windows machine. Users of Windows 10 Enterprise, Windows 10 Pro, or Linux can download and use the latest version of Docker with HyperV backend. Docker will be used to install other packages such as PostgreSQL, Apache Hadoop, Apache Hive, Apache Pig, or Apache Hbase. Students should also download and install Knime 4 and the DBeaver software.
Proctored Exams
The Final Exam is proctored via Honorlock.
Grading and Examinations
Additional information about assignments and related topics will be posted on the course site when appropriate.
Assignment | % of Final Grade |
Exam | 30% |
Projects | 50% |
Homework | 10% |
Discussions | 10% |
Grades will be based on the following scale:
A = 94 – 100, A- = 90 – 93, B+ = 87 – 89, B = 84 – 86, B- = 80 – 83, C+ = 77 – 79, C = 70 – 76, D = 60 – 69, and F = 60 and below.
Class Participation (Discussion Forums)
There will be discussion boards for students to discuss among themselves different aspects of the course, and I will participate in the discussions when it is appropriate. Use the discussion board to post your questions and to read the responses from your classmates. Your answers to discussion questions should be submitted by Friday at midnight. You are expected to provide at least two answers to the discussion questions. Further discussions, comments, and feedback will be accepted up until the due date listed on the Course Schedule. Every student is expected to provide at least two feedback postings.
Homework Assignments
There will be several homework assignments, two projects, and one exam. You are free to use any material or software package to solve the problems with adequate references, unless specifically specified.
For these assignments, you are responsible for all the material covered in class as well as in the assigned readings. Assignments should be completed without collaboration with other students or individuals. Refer to the Course Schedule for lesson timeframes, due dates, and times. Your responses to each assignment must be submitted in the specified file format, either PDF, DOC, DOCX, XLS, or XLSX format, and must be placed in the appropriate assignment.
Students are free to write their responses by hand and then scan them into a PDF file. Late submissions will not be accepted unless there are mitigating circumstances, and the instructor has given permission prior to the due date.
Exams
Exams in this class are summative, not formative. They are designed for the purpose of assessment as well as benchmarking students’ performance in the course and are not intended for student learning. Answer keys to individual questions, therefore, will not be provided. However, a narrative of ways to improve will be provided upon request from the student.
*NOTE: The last exam will be due on the last official scheduled day of class. No make-up exams will be given, except in cases of emergencies or with prior approval. Any questions on exams should be directed to your instructor.
The last exam will be due on the last official scheduled day of class. No make-up exams will be given, except in cases of emergencies or with prior approval. Any questions on exams should be directed to your instructor.
Course Topics
- Introduction to Docker
- Models for Big Data
- PostgreSQL Architecture and Installation
- Dimensional Modeling
- Extract Transform Load (ETL)
- Data Warehouse Reporting
- Hadoop Architecture
- Retrieving Data with MapReduce
- MapReduce Examples
- Apache Pig
- Apache Hive
- Big Data Use
Prospective Students
For more information on this program, check out the Master of Data Analytics website!