4.4 Applying Big Data Analytics to Computational Physics

4.4.1 Big Data with Computational Physics

Big data analytics methods can be used to find a middle ground between regular computer simulation and real world simulation. This is because big data analytics methods can achieve a high amount of accuracy similar to real world modelling while being simpler and cheaper like conventional computer simulation. Though, when using big data analytics with computational physics, it is important to differentiate the types of data that will be examined. The major difference is that within computational physics data sets are almost always accurate because most of the time the data is created to be accurate. Data sets within computational physics are created with the sole intent of being examined as opposed to most big data sets which are created for a different purpose and then examined. For example, text message data, as opposed to data used in computational physics, is a set of data created to communicate at first and then examined to find trends in the data. Where as a set of global climate data is created for the sole purpose of being analyzed and is assumed to be accurate.

Pros:

  • Accurate
  • Inexpensive
  • Able to predict
  • Universally Applicable

Cons:

  • Difficult to implement

Accurate:

Using big data sets allows computer models to achieve a much higher degree of accuracy than conventional computer simulation. Big data sets lead to more node points in soft body physics, more test points in climate science and more accurate models with aerodynamics. As the size of the data available to process is increased so is the accuracy of the models.

Inexpensive:

Big data analytics with computational physics is quite cheap compared to real world modeling. Although a one petabyte Hadoop cluster will cost around one million dollars this cluster can be reused and filled with new data frequently [30]. At first, implementing a big data alternative to real world modeling will be more expensive. Eventually, the costs of real world modelling, detailed in the real world modeling section, will significantly surpass the cost of big data analytics.

Able to predict:

By using much larger data sets in computational physics, predictions can be made. This is most apparent in the field of climatology. This field is entirely about predicting future weather patterns with larger and more complete climate models a more accurate prediction can be made about future weather patterns [31]. Some big data handling techniques are already used to do this such as high performance computing systems.

Universally Applicable:

This positive does not specifically apply to big data analytics within physics. Once a system such as Hadoop has been deployed the platform can be used for many other typical big data analytics uses that are detailed in other chapters.

Difficult to Implement:

To implement big data analytics within the field of physics, engineers with a strong understanding of both physics and data handling are needed. These engineers are already in short supply especially in the United States. This makes big data analytics methods hard to use at first because there is a barrier involved with setting up the systems [32].

4.4.2 Useful Techniques of Big Data Analytics for Computational Physics

According to most people, big data is defined by the data having three V’s: volume, velocity, and variety [35]. Within computational physics the data as predictable so there is very seldom variety. This is because data is generated by instruments placed to generate scientific data. For example, to generate a climate model there is predictable pressure, temperature, and wind speed data that is produced. The only factors left are volume and velocity. The different techniques predictive analytics, high performance computing, and machine learning all tackle the issues of high volume and large velocity of the data within computational physics.

Predictive Analytics

Predictive analytics can be used to predict different situations as the name suggests.  This tactic is often used to help businesses make decisions based on analyzed statistics. For example, IBM offers many different pieces of software that utilize predictive analytics software to help in marketing, sales, or other business sectors [36]. However, this method could also be put to use in the realm of simulated Physics. One possible real world use is to predict global catastrophes. For example, a detailed model of the global climate has been used before to predict massive storms and tsunamis [37]. In order to create these detailed models predictive analytics must be used with big data. Further uses of predictive analytics and big data will be discussed within the continuing examples.

High Performance Computing Systems

Although creating high performance computing systems may seem like a straight-forward topic because some people may think he or she could just add more computing power; high performance computing systems are actually quite complicated. The largest issue with this is I/O, input and output, time. An average CPU in modern computers has the ability, although it is not likely, to create .2 terabits of data per second [38]. In comparison, many modern hard drives are maxed out at 200 Megabytes per second. There is an obvious gap between the performance of the CPU and being able to store the data that is produced. These issues can be solved with the use of tools like Hadoop that localize the analyzation of data [39]. Using high performance computing systems will allow significantly more accurate models to be created.

Machine Learning

Perhaps the most interesting tactic that can be applied to solve physics problems is Machine Learning. This strategy is used in machines like Watson that was developed by IBM. If you were not aware, this machine was able to win Jeopardy when matched up against some of the smartest people in the world [40]. Watson can do many other things from helping veterinarians to being a museum guide. This type of technology could be used to make the physicists job easier and it could be applied to these real world situations. Machine learning, more than just predictive analytics alone, can be used to predict situations that no human could have ever seen coming. This is because it can absorb all of the information about the situation and arrive at conclusions more quickly. For example, a plane crash could be avoided if the machine were to identify turbulent weather ahead of the plane. The machine could warn the aircraft and a crisis could be averted. In a way, machine learning is the culmination of the different Big Data concepts.

4.4.3 Continuing Examples

The previously discussed strategies will be discussed within the context of the continuing examples of aerodynamics, soft-body physics, and climate science. This section will show the effectiveness of big data concepts over modern computer simulation and real world simulation.

Aerodynamics

Predictive analytics are not very useful in the field of aerodynamics; however, high performance computing is extremely useful. High performance computing is used to achieve a much higher degree of accuracy when generating aerodynamics or even hydrodynamics models. Marussia, a successful Formula 1 (F1) racing team, is using a supercomputer to help design their cars before they build them. By doing this the team can operate on a thirty-million pound budget rather than a 150 million pound budget like most F1 teams[41]. Those are substantial savings that will surely be adopted by future F1 teams. Machine learning concepts can be used to help make better models of aerodynamics by absorbing real world data and creating solutions that a human would not be able to. The Computational Aerosciences Laboratory at the University of Michigan is already doing this [42].

Soft-Body Physics

Predictive analytics can be very useful when used as a tool to calculate soft-body physics. A computer science research team at Brown University used computer simulation of soft-body physics and predictive analytics to determine the behavior of diarthrodial joints (joints that have a wide range of motion) in the human body [43]. This team was only able to achieve an accuracy of 74%. By combining predictive analytics and big data this team could have attained a much higher degree of accuracy. A paper written at the Universidade Federal de Pernambuco addresses the need for high performance computing in order to make accurate reproductions at an interactive rate [44]. To be reproduced at an interactive rate is a simulation which can be interacted with in real-time. Being able to maintain accuracy at an interactive rate is a very important goal to make the simulations more relevant to the real world. For example, surgical simulators can be used to train future surgeons, but those simulations must be both accurate and interactive or they are useless. One of the ways this is made possible is through big data and high performance computing. Machine learning has already been used to reduce computing time when conducting galaxy models and N-body simulations [45]. Because soft-body simulations are significantly less complex than galaxy models, it is highly likely that machine learning in conjunction with big data could be used to to reduce computing time of complex soft-body simulations.

Climate Science

Climate science and predictive analytics have led to an entirely new field that some call weather analytics. It is a highly useful strategy that starts with collecting big data involving the weather and climate and then using predictive analytics to predict future weather patterns or crop outcomes [46]. Although the field is in its infancy, businesses are already very interested because one third of US commerce is sensitive to weather. Thus, weather predicting is very important. Often in order to work with all the climate data that is collected by groups such as NASA and companies like The Weather Company, organizations must take advantage of high performance computing. The NOAA (National Oceanic and Atmospheric Administration) has just upgraded their supercomputers to a combined 5.78 petaflops of computing power [47]. The NOAA uses this power to create more accurate predictions of the weather and climate. These predictions are more accurate now than ever, and they will continue to increase accuracy if computing power continues to go up. When neither predictive analytics nor high performance computing are enough to create accurate climate models, machine learning is used. Historical data for climate models is often sparse and has massive holes in it. This kind of data is highly suited for applying machine learning [48]. By applying this concept, the shortcomings of other methods may be overcome, and natural disasters like droughts, fires, and tsunamis could be predicted more accurately.