1.2 The Concept of Big Data

There are three defining characteristics of big data, which are volume, variety, and velocity. Volume is the size of data. Variety refers to the diversity of data, and velocity refers to how fast the data gets processed. The variety and velocity demands for big data are significantly beyond the capabilities of a conventional computing infrastructure. Some aspects of the Internet certainly satisfy both variety and velocity requirements.

In addition to the three V’s of big data, there are also other aspects of big data, which separate it from ‘small’ data. For instance, the utility or value of big data is not always very clear initially. It may evolve as analysts develop more insights on the data they are dealing with and tools to exploit it. The importance of a search engine was not immediately clear in the history of the Internet, but search engines eventually emerged as a critical element of the Internet experience. Finally, the last V of big data is veracity which refers to the quality of collected data. Inaccurate or false data leads to poor decision making regardless of how excellent other Vs of big data are.

Unlike ‘small data’ whose location is mostly limited to a single file or local on a user’s computer, big data is often both logically and geographically dispersed. Big data also tends to be more heterogeneous. They come in many different forms/units and are unstructured. In addition, due to its enormity, processing big data requires more manpower and multiple groups of people specializing in tasks such as collection, analysis, and use. All these efforts needed to prepare big data sets make it much more difficult to simply throw them away and therefore increase their longevity.

So what do we do with all this (big) data? Simply collecting the data and holding on to it do not do any good. Therefore, an important question to ask is how to leverage the big data. There are already several conventional approaches for utilizing data regardless of their size. In this chapter, we will briefly introduce each of these well established data processing methods to help you obtain basic understanding of their concepts and significance in the context of big data. We will also discuss enabling technologies making it possible to apply these classic data management technologies to big data.