by William Aiken, wva5029@psu.edu
Have you ever taken any online courses on big data analytics? Since the aim of this project is to bring big-data skills to students who may not necessarily have a strong background in hard computer sciences, it is very important to collect and analyze the best possible resources in the field of big data. During our time spent researching many of the courses already available, we have come to find that a plethora of high-quality learning resources are already available, and many of them are free to access. In this blog entry, we will highlight some of the most helpful and accessible online courses we have come across so far.
Coursera
Course Title | Offered by | Difficulty Level | Cost |
Building a Data Science Team | Johns Hopkins via Coursera | Beginner | Free |
Data Analysis and Statistical Inference | Duke via Coursera | Medium | Free |
Mining Massive Data Sets | Stanford via Coursera | Advanced | Free |
Coursera’s motto is to give students “universal access to the world’s best education, partnering with top universities and organizations to offer courses online.” Coursera is another powerful learning platform that allows users from anywhere in the world to take a number of (usually) free, yet extremely high-quality courses on their own schedule. Frequently, these courses are taught by professors in a style very similar to what they would teach in a traditional classroom.
Coursera offers a certificate for completing the course, but they have a rigorous evaluation in doing so. For example, it is common to be mandated to keep with the progress of the class as well as submitting homeworks before a certain deadline in order to actually receive the certification. For highly-motivated students, this may be a negative aspect, but in demonstrating skills to a prospective employer, it may make all the difference. Let’s take a look at some of the Big Data course that Coursera has to offer.
Being able to understand the building blocks of Big Data is very important, but in a company setting, even the best analysts and developers do not work alone. As a result, having a comprehensive view of how and where Big Data fits into the structure of a larger organization is very important. Coursera offers a course via Johns Hopkins University that offers a little reprieve from number crunching and complex analyses in order to focus on organizing and empowering a Big Data team.
The course Building a Data Science Team [1] would be an excellent building block for someone already with a decent understanding of Big Data, and as a matter of fact, this course fits into a larger 5-course series “Executive data Science Specialization” that Coursera offers. Taught by Bloomberg School of Public Health PhDs Jeff Leek, Brian Caffo, and Roger Peng, this course provides a different perspective than what other Big Data courses offer in that how a team grows is just as important as how the data grows. The course includes topics that relate a Big Data team to the rest of the organization as well as covers questions involved in hiring a good data scientist.
After gaining valuable insight into where Big Data teams fit into the bigger scheme of a company, students can take a look at the Data Analysis and Statistical Inference [2] class spearheaded by Duke University. It’s not the most seemingly predictable step as the development of a Big Data scientist, but getting a student’s feet wet on both ends of the extreme isn’t such a bad idea. In the Building a Data Science Team course, the student learns the human-interaction aspects of Big Data analysis as part of the organization; in this course, the student dives into world of R, ANOVAs, and Chi-squares on statistical analysis of data. No prior programming is needed for this course!
Data Analysis and Statistical Inference is taught by Dr. Mine Çetinkaya-Rundel, a PhD in Statistics whose recent work has focused on “developing student-centered learning tools for introductory statistics courses”. There is no doubt that being a strong team member of data science also requires a thorough understanding of even the nitty-gritty concepts. Where better else to achieve proficiency than in a class of this quality?
After getting a taste of both ends of data analysis, let’s jump right into the complex tasks that any Big Data scientist will face. Mining Massive Datasets [3] is taught by Stanford university’s Jure Leskovec, Anand Rajaraman, and Jeff Ullman, and it reflects the content in Stanford’s CS246 class of the same name. The class spans a wide variety of content starting with an introduction to MapReduce and nearest neighbor searches, and eventually reaching concepts involved in web advertising and MapReduce Algorithms.
Be warned though! This course is not for the lighthearted. Jumping right into big data analytics is probably not the best idea. It is recommended not only to have knowledge about database systems but also to be familiar with basic algorithms and data structures in general. Also, they recommend that “you should also understand mathematics up to multivariable calculus and linear algebra”. Data Analysis and Statistical Inference may not cover all of the math required for this course, but it will definitely prepare the student for the next steps.
Miscellaneous Courses
Course Title | Offered by | Difficulty Level | Cost |
Big Data Fundamentals | Big Data University | Beginner | Free |
Intro to Data Science | Udacity | Medium | $199.00 / mo.* |
Massively Parallel Computing | Harvard Extension School via iTunes | Advanced | Free |
*Udacity offers a 14-day trial, and a 50% tuition return if you complete a “Nanodegree” within a year
There are many different platforms for online courses, and in this section we can take a look at other options available for students. Again, we limit the review to only three courses, from Big Data University, Udacity, and iTunes, in that order. While both the Big Data University course, an initiative by IBM to build a smarter planet, and the iTunes course, provided by the Harvard Extension School, are free, Udacity does require a fee per month. However, Udacity does allow for a 14-day trial where students could get started and try out the service before committing to it.
Big Data University offers a very large amount of courses and information completely free of charge as a part of an IBM community initiative. The Big Data Fundamentals course [4] is aimed completely at beginners and is only 1.5 hours long. Primarily, the goal of the course is to provide a well-rounded approach to Big Data, especially in context of getting “people throughout the enterprise to run the business better and to provide better service to customers.”
It is a self-paced course that does offer a certificate through BigDataUniversity.com. Students become familiar with Big Data and what roles it plays in a corporate environment as well as learn about specific examples of its application in sensory data, social media, etc.
Udacity’s Intro to Data Science: Learn What It Takes to Become a Data Scientist [5] seems like a logical place to continue for students interested in furthering their Big Data skills via the application of some big data techniques. Udacity comes at the cost of $199.00 per month, a steep price for some students, but considering that Udacity does offer “nano-degrees” in various subjects, it’s not totally unreasonable. For those not interested in enrolling with Udacity, the instructor videos, and instructions for exercises and projects are still available for free.
Intro to Data Science is approximately a 2 month course that assumes 6 hours of work per week, but it is up to the student to work at his or her own pace. It is a part of the larger “Data Analyst Nanodegree.” Despite the name, this course is not necessarily a beginner course as it expects individuals with experience in Python, and some programming and statistics concepts. For those who do take the course, they will receive nearly all the fundamental concepts of data science from manipulation up to and including working with Big Data via MapReduce concepts.
Finally, there is the Harvard Extension School’s Massively Parallel Computing [6] course available through iTunes. While videos were released in 2010 and the quality is not fantastic, the course does offer students a hands-on experience on a very wide variety of resources that Big Data ultimately takes advantage of.
Some programming knowledge is pretty much mandatory, but as a more advanced course, this is to be expected. The Lectures cover topics as varied as learning multi-threaded programming to GPU programing to an application of MapReduce using Hadoop on Amazon’s EC2 platform. A student could easily pick and choose concepts from the course to get a solid understanding of more advanced topics straight from the Harvard School of Engineering Computer Science 264 course.
lynda.com
Course Title | Offered by | Difficulty Level | Cost |
Techniques and Concepts of Big Data | lynda.com | Beginner | $19.99 / mo.** |
Hadoop Fundamentals | lynda.com | Medium | $19.99 / mo.** |
Up and Running with Public Data Sets | lynda.com | Beginner | $19.99 / mo.** |
**Cost of the basic annual billing plan billed as a one-time payment of $239.88
The online website lynda.com is committed to providing high-quality courses taught by industry experts across a wide-variety of topics. Several of these courses span into big data, both directly and indirectly. And fortunately, many university students have access to all lynda.com courses for free!
To get started, lynda.com offers an introductory course entitled “Techniques and Concepts of Big Data” [7] taught by Barton Poulson, who “has a deep love for data analysis and data visualization” according to his lynda.com profile. This course is for any beginner of big data and spends a little more time describing the basics of big data in general and where it fits into both computer science and business. It takes some time to explore and discover how big data is “big”; how big data is used by consumers, businesses, and research; and how big data is stored, prepared, and analyzed. Overall, it’s fairly comprehensive and could definitely be used for anyone who has never gotten their feet wet with big data at all. “Techniques and Concepts of Big Data” also all includes a discussion of both Excel and Hadoop, so it leads well into the other big data courses that lynda.com offers.
A good follow-up course to “Techniques and Concepts of Big Data” or a good course for anyone looking to get into the application of big data languages and manipulation is “Hadoop Fundamentals” [8]. This course is taught by Lynn Langit, who founded of her own consulting firm and works directly with many big data platforms including Hadoop, AWS, Azure, and many others. She is also a cofounder of “Teaching Kids Programming” and does an excellent job in this course of introducing a wide variety of languages and concepts.
This course does offer an introduction to most big data concepts in general as well as how to apply them in Hadoop and the languages that this involves. She covers the concepts of Volume, Velocity, and Variety as well as the HDFS (Hadoop Distributed File System) and big data database structure. Langit also walks the student through a setup of a Cloudera VM and running of MapReduce programs in Java, but she also demonstrates the use of scripting languages in Hive, Pig, etc. Most of the examples are the “Hello World” equivalent to Big Data/Hadoop, which is essentially a word count application. As a result, the course can be used as a starting point in which students discover which aspect of big data manipulation is most interesting to them.
Another course offered by lynda.com isn’t inherently big data, but after getting hands-on with big data concepts and applications in the other courses, the “Up and Running with Public Data Sets” [9] with Curt Frye course seems like the perfect direction to go for any student. This is especially true because not only does he explain what the public data sources offer, but he also shows how to download any individual data sets as necessary.
It covers a broad range, starting with the U.S. census and eventually goes into worldwide sources like from the U.N. or Google’s Ngram Viewer. The course does some visualization of the data via Excel; however, using any one of the individual lessons to gather data would no doubt fit into a larger big data curriculum.
Conclusion
We asked if you have ever taken any online courses on big data analytics. Before setting off on this Big Data project, many student research assistants of this group definitely had not. It is our goal to collect and analyse the best possible resources in field of big data. The courses here are all high-quality learning resources, and many of them are free to access, and those that are not free still provide a fair amount of their material at no cost. We hope that at least a couple of these courses can inspire you to take the plunge into the world of Big Data!
References
[1] https://www.coursera.org/learn/build-data-science-team
[2] https://www.coursera.org/course/statistics
[3] https://www.coursera.org/course/mmds
[4] https://bigdatauniversity.com/courses/big-data-fundamentals/
[5] https://www.udacity.com/course/intro-to-data-science–ud359
[6] https://itunes.apple.com/us/itunes-u/csci-e-292-massively-parallel/id429428651?mt=10
[7] http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
[8] http://www.lynda.com/Hadoop-tutorials/Hadoop-Fundamentals/191942-2.html
[9] http://www.lynda.com/Tableau-tutorials/Up-Running-Public-Data-Sets/368761-2.html
[…] Online Courses on Big Data Analytics […]