iSchool students are looking into a variety of technical fields. Big data careers are often discussed, but obtaining a career in big data analytics can be a confusing process. Many students or future students of a big data discipline may be wondering where to begin. It is important for a potential data scientist to understand what personal qualities and characteristics are needed to develop a professional career in big data analytics. In this section, our goal is to introduce desirable personality traits for a big data professional.
2.3.1 Versatility
The first major concern many students who seek careers in big data face is “what do I have to major in?” Earning a degree in information technology or a related field is a good starting point. In fact, many IT programs are starting to offer majors specializing in big data analytics. Statistics and physics are examples of related fields that students may choose [4] [5]. Strong mathematical skills are also key in a big data career [4].
Along with education there are many characteristics a person must possess in order to become a successful data scientist. Earning a degree in a technology or math-related field is important; however, learning how to create and use big data is often accomplished externally. This is done by taking online courses, conducting research, or working with experienced professionals [6]. Potential big data professionals must be able to learn quickly and on their own time as well. After all, education is an ongoing process in any field including big data.
Aspiring data analysts must be aware of the general concepts of big data [4]. Potential data analysts must understand the generic concepts of big data in addition to an in-depth understanding [4]. This means that you should be familiar with the concepts such as volume, velocity, and variety as well as the need for big data in today’s world [4]. A big data scientist’s knowledge should range from the practical implementations to the more theoretical concepts that drive them.
2.3.2 Entrepreneurial spirit
To successfully use data to better a company, a big data scientist should thoroughly understand the concept of a cost-benefit analysis [4]. Learning how a company earns and spends their money is a critical piece of information to many big data professionals. Data scientists and analyzers must be able to calculate the cost of either collecting data or using that data [4].
To illustrate this point, take a new medical insurance company collecting the number of walking steps taken by their policyholders. This is done on a newly designed phone application given to each customer. The design and production of this application cost X amount of dollars. Collecting the data costs Y. The purpose for this data is to calculate the average steps of a variety of people who represent different age groups and genders. This information will allow the company to lower insurance rates for qualified individuals and increase them for less mobile members. As a data analyst, a question you must ask is whether collecting the data through the application is worth the cost or risk? Will the information to be collected offset the expense of gathering the information or the risk of getting into legal trouble due to privacy concerns?
It is crucial for a professional in big data to be able to bridge the gap between business and information.
2.3.3 Agility
In the context of big data, the amount of data produced grows exponentially. The way data is being used and collected is also constantly evolving. At the same time, IT is a fluid industry, and employees move between employers quickly. This gives IT professionals a flexible and rich career path, but it also requires them to be able to adapt to many different technical environments. According to the U.S. Bureau of Labor of Statistics, the turnover rate in information technology was at 44 percent in 2013. [7]. This is because IT professionals have a variety of jobs to choose from and very marketable skills [8]. This seems to be particularly true for younger IT professionals whose lifestyles typically allow them to ‘job hop’ as they please [8].
As seen in Figure 4, with remarkable growth of the big data market, the demand and pay will only increase for big data professionals for the next decade. [9].
Figure 4: Big data market forecast, 2011-2026 [9]
For these reasons, it is crucial for big data professionals to be able to obtain more skills across a variety of platforms. We can find two stunning examples of large emphasis shifts in over the past couple of years. According to IEEE Spectrum’s ranking of the top ten programming languages of 2015, the R statistical and programming language has gone from ninth to sixth place [10]; Python’s position actually fell from third to fourth, a change many professionals saw as a surprise [11].
And finally, the discussion of agility would not be complete without mentioning the jump from Hadoop 1.X to 2.X. Hadoop is a programming framework designed to process big data in a distributed computing environment. In light of the growth of big data-centric programming languages, it’s easy to see the potential of Hadoop 2 since it is no longer restricted to Java. By 2014, Yahoo! and eBay have already moved to Hadoop 2 [12].