by Whitney Hernandez, woh5152@psu.edu
What is ‘big data’? The world is ever becoming a place that relies on the availability of data. Big data is a term that describes the expansion and variation of that data. The importance of using and understanding big data is growing continuously. The use of big data in decision making demands more definitive and correct analyses with ‘big data analytics’ skills. Big data analytics will allow the opportunity for cost and risk reduction along with operational improvements for an organization. Then, what are the characteristics of big data and how is it used in big data analytics?
How do we define big data?
Doug Laney, an industry analyst defines big data by the three Vs. The three Vs include volume, velocity, and variety. Volume is defined as the amount of space that a substance or object occupies. In the case of big data the substance occupying space is data. Big data volume is the increasing amount of space where data is accumulating. Velocity refers to the speed at which data is streaming. Lastly, variety means that data is available in many different types which must be organized and managed successfully. Big data comes from a multitude of resources and depends on the increased computer power, mobility, storage space, and digital content being created [1]. All of these factors are combined to create the variety seen in big data analyses. Big data focuses around factors such as patterns, predictions, and probability [2]. Big data is becoming a very common topic with the growing demand of data. Understanding the concept of big data is not difficult, but its application to real-life problems could be very complex. The current generation is paving the way for big data without any references for guidance [2]. Advances in Big data research may be able to manipulate how we live our lives without our knowledge [2]. This possibility brings to light a very fine line between manipulation and assessment [2]. This perspective can then lead into the realization of why managing extensive amounts of data correctly is extremely important.
Understand big data characteristics
According Geczy , data sensitivity is one of the most important big data characteristics [3]. In addition, other valuable characteristics include diversity, quality, volume, speed, and structure [3]. Let’s first focus on sensitivity, which is immensely critical because it characterizes data, influencing policies and procedures [3]. Data sensitivity determines if the data contains delicate, confidential or personal information about an organization [3]. When an organization is able to identify its sensitive information, they are more vigilant how to use and protect that data.
Diversity represents the variety of data types. For instance, cellular phones have audio, video, location data, text messages, and so on [3]. The difference in data type requires handling the data in diverse ways [3]. Diversity is an aspect of big data that makes understanding it so complicated. There are hundreds of different formats for data making the variety so large [5].
Quality of data focuses on the completeness and accuracy of the data [3]. To illustrate, one organization may keep track of birth up to second while another keeps track of only the day [3]. This depicts how the latter organization does have a greater quality because it provides more precise information. The process to obtain more complete quality data may be easily achieved or it may be extremely complex [3].
Volume as mentioned earlier refers to the amount of space data is taking up. This aspect of data characteristics can indicate the size of data in bits and bytes [3]. One major issue is that while storage and computer power are rising, it is not meeting the demands of the rapidly growing data volume [3]. The volume of big data continues to grow because it is no longer generated by just the employees of an organization [5]. Presently customers, along with business partners are adding enormous amounts of data to a company. Also some organization data is machine created from many different mobile devices [5]. We are all aware of the growing number of mobile device users which result in an influx of data.
Speed also termed velocity represents how quickly or slowly data is acquired or released [3]. Inflow speed refers to the speed of which an organization requires when dealing with different types of data [3]. Outflow on the other hand refers to how long it takes for data to flow out of an organization [3]. The speed for either inflow or outflow depends on the data type [3]. For example, it is much faster to download audio files than it is to download video data types [3]. Many companies use the batch method to process data in real-time [5]. This process only works if the rate of the incoming data is slower than the processing rate.
Lastly, structured and unstructured data refers to the organization of big data [3]. Structured data have organized qualities while unstructured do not [3]. Structured data are easily processed allowing them to be included in common databases [3]. Unstructured data are more difficult to process and need to be pre-processed before a conventional database can process the data [3].
This leaves us with the question “how do we use these big data characteristics”? Big data analytics is the process that puts all of the above characteristics into action. It is necessary to understand the difference between big data and big data analytics. It is equally important to understand how these two terms intertwine and relate to one another.
What makes big data important?
Big data does not just depict large amounts of information from a company but also the analysis of the data [4]. As the amount of information readily available increases so does the need to understand it. Big data allows information to become predictive of things such as customer trends [4]. Due to the excessive amount of information being created by the human race big data has become an extremely popular field of study [4]. Big data in not just a new trend for companies it is a permanent solution to overcome many different issues in an organization [4]. The cost of big data tools have become and are becoming more affordable along with accessible for all types organizations [6]. Big data solutions is not a practice you can avoid because it is integrated into countless aspects of daily human life [6].
What are big data analytics?
The overall purpose of big data analytics is to analyze large masses of data that will aid an organization in their decision making. So to reiterate big data is a large amount of different types of data that can be measured by its volume, velocity, or variety. Big data analytics is used to uncover unknown patterns, market trends, preferences of customers and much more [7]. Big data analytics is the actual process of understanding and using data to benefit an organization. Data science is the term that refers to the tools used to observe and analyze big data [11].
There are several models for big data analytics including predictive, descriptive, and decision models [10]. The purpose of the predictive model is find the likelihood a different samples will perform in a specific way. The predictive model typically calculates live transactions multiple times to help evaluate the benefit of a customer transaction [10]. The descriptive model illustrates relationships between the customer and product/service with the acquired data [10]. This model can be used to organize a customer by their personal preferences for example [10]. Lastly, decision models incorporates the predictive model and all elements that lead to a decision. This model is usually used to create business rules or logic [10].
There is reactive and proactive big data analytics that help companies understand the different aspect to big data [8]. Reactive big data happens when information is pulled from large sets of data and decision making results from the pulled information [8]. Proactive big data analytics is tracking only relevant information from volumes of data [8]. To review proactive analytics obtains relevant data and then reactive analytics occurs when a decision is made based off of the acquired information [8].
Big data analytics can only be done successfully if the data is managed correctly [9]. Data scientist have become even more valuable due to the lower cost of big data [9]. It is important for companies to hire professionals such as data scientist because of their experience working with large amounts of information [9]. The problem with data scientist is that they are in high demand and in little supply [9]. Data scientist should have an overlap of skills in hacking, statistics, and overall expertise [12] It is necessary that data scientist have all of the skills sets to fully understand how to work with big data. The main goal for big data analytics is to aid organizations in the decision making process [7]. The analyses can be done with software however the structure of data may not allow it to be used or stored in traditional data warehouses [7]. This is why companies use new technology tools such as Hadoop, YARN, MapReduce, Spark, Hive, Pig and NoSQL [7]. All of these technologies form the center of the open source framework that is able process big data sets. Connecting frameworks such as Hadoop with relational databases has been made easier with software connectors [7]. Supporting big data in the correct manner is of utmost importance when it comes to confidentiality and integrity [1]. When an analysis is using data models or analytics to understand big data, they must be confident that the information has not been altered or made accessible to unauthorized personnel. As a final review big data analytics is the process that adds value to the data collected by an organization.
What fields are using big data?
Big data analytics can be used in many different fields to help accomplish various tasks. For example, big data analytics can benefit scientific discovery, industry, living environments, healthcare, and the list is growing [1]. Let’s look at social network big data for a more in-depth example on how to put big data into action. Typically social networks are used to collect personal information about the users in order to produce a more personalized environment. This is how businesses are able to create targeted advertisements for a particular user [1]. Retailers are a common example of an organization that uses big data to predict which products will sell and how to prepare properly for customers [6]. For example, a store can now predict that they will sell more blue sweaters than usual during a certain month. The realtor now knows to order more blue sweaters for that particular month to prepare for the increase in sales. There are many companies and organizations that have used big data for their advantages. Some major companies taking advantage of big data include Amazon, Walmart, and IBM [4]. To further illustrate, Walmart uses data mining to understand product sales patterns [13]. This is how shopping recommendations are created for online shoppers [13]. A more detailed example of successful data mining at Walmart is strawberry poptart sales [13]. Walmart found strawberry poptart sales went up significantly before hurricanes [13]. This lead Walmart to place the pop tarts at checkout counters before hurricanes [13]. Car insurance companies are another example of an organization that gathers data for a better understanding of how their customers drive [6].
In addition, the government and NASA have also benefited from big data technology [4]. It is important to remember that big data is also a major part of consumerism [6]. Cities have been using big data to become ‘smart cities’ [6]. For example, in a smart city, traffic lights would know when traffic is the busiest and respond accordingly [6]. Police have been able to use collected information to determine future criminal activity [6]. There are many companies and organization that are capitalizing on big data in a multitude of ways.
Conclusion
To summarize, big data could seem like an extremely large concept to fully comprehend. The applications for turning data into useful information is endless. Big data is measure by a variety of characteristics and takes very skilled individuals to put into action. As the use of big data becomes even more relevant into today’s world, it is necessary to understand its basic definition and terminology associated with it.
References
[1] Demchenko, Y.; de Laat, C.; Membrey, P., “Defining architecture components of the Big Data Ecosystem,” in Collaboration Technologies and Systems (CTS), 2014 International Conference on , vol., no., pp.104-112, 19-23 May 2014.
[2] Swan, M., “Philosophy of Big Data: Expanding the Human-Data Relation with Big Data Science Services,” in Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on, vol., no., pp.468-477, March 30 2015-April 2 2015.
[3] Géczy, Peter. “Big data characteristics.” The Macrotheme Review 3, no. 6 (2014): 94-104.
[4] Matteson, Scott. “Big Data Basic Concepts and Benefits Explained – TechRepublic.” TechRepublic. N.p., 23 Sept. 2013. Web. 15 Oct. 2015.
[5] Soubra, Diya. “The 3Vs That Define Big Data.” – Data Science Central. N.p., 5 July 2012. Web. 15 Oct. 2015.
[6] Marr, Bernard. “Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone.” Data Science Central. N.p., 18 Apr. 2015. Web. 15 Oct. 2015.
[7] Martinek, Lisa, and Craig Stedman. “What Is Big Data Analytics? – Definition from WhatIs.com.” SearchBusinessAnalytics. N.p., Oct. 2014. Web. 15 Oct. 2015.
[8] “Big Data Analytics: What It Is and Why It Matters.” Big Data Analytics: What It Is and Why It Matters. SAS, n.d. Web. 15 Oct. 2015.
[9] McAfee, Andrew, and Erik Brynjolfsson. “Big Data: The Management Revolution.” Harvard Business Review. , 01 Oct. 2012. Web. 15 Oct. 2015.
[10] Strickland, Jeffrey, Ph.D. “What Is Predictive Analytics?” Linkedin. N.p., 25 Jan. 2015. Web. 15 Oct. 2015.
[11] Conway, Drew. “What Is Data Science?FAQ.” Drew Conway’s Answer to What Is Data Science? N.p., 22 Aug. 2010. Web. 15 Oct. 2015.
[12] Conway, Drew. “The Data Science Venn Diagram.” Drewconway.com. N.p., n.d. Web. 15 Oct. 2015.
[13] “How Big Data Analysis Helped Increase Walmart’s Sales Turnover?”DeZyre. N.p., 23 May 2015. Web. 19 Oct. 2015.
Leave a Reply
You must be logged in to post a comment.