Don’t Study Large Earthquakes with Mathematica’s EarthquakeData

Mathematica contains information on earthquakes that is accessible using the built-in EarthquakeData function. The function includes useful search approaches that allow users to select earthquakes by magnitude, depth, date, etc. The geographic selection tools are particularly useful – you can select earthquakes by geographic regions, polygons, distance from a reference location, etc.

Unfortunately, something is wrong with the data. The earthquake information are not what most earthquake seismologists would agree is accurate. I have not combed through their entire data set, but the Wolfram data seem to have a problem with at least earthquake magnitudes. Although magnitude is the most commonly quoted estimate of an earthquake size, it is not a quantity easy for nonspecialists to understand and to package for general use.

The development of earthquake magnitude scales has a complex history because observational seismology developed before the firm theoretical basis of faulting and earthquakes was established. Many magnitudes may be associated with a large earthquake and not all are of equal value or even accurately reflect the size of the earthquake. Seismologists continue to measure and use a suite of magnitudes because that’s how we can compare old and new earthquakes.

The data for large earthquakes returned by the EarthquakeData command are inaccurate. I emailed Wolfram several years ago to describe problems I saw on Wolfram Alpha and they responded, but serious, obvious problems remain.  This is not a big deal to me, I use Mathematica to analyze and display earthquake data often, but I never use Mathematica’s data. But two things concern me: (1) Do all the data included with Mathematica suffer from such substantial errors (I don’t think so, but I wouldn’t know the details); (2) I worry about students exploring earthquakes on their own who perform calculations with the Mathematica’s data and who then are disappointed and discouraged that the effort was wasted because the input was flawed.

In the following, I evaluate some of the EarthquakeData information for large earthquakes. Problems appear quickly.

A simple search for a list of great earthquakes

Great earthquakes are earthquakes with magnitudes greater than or equal to 8.0. Mathematica makes searching for earthquakes with a minimum magnitude convenient. The search results are a list earthquake entities that contain properties related to each earthquake in the list. Since we know much more about recent earthquakes than historical events, some properties for the older events may be missing.

Here is an example query to get a list of all great earthquakes in the Wolfram data base.

(* Select earthquakes with magnitudes greater than or equal to 8.0 *)
eqs = EarthquakeData[All, 8.0];
Print["Returned " <> ToString[Length[eqs]] <> " earthquake entities."]

The returned list includes 253 earthquakes. The first earthquake in the list occurred in 1096, and was located along the coast of Honshu, Japan (presumably the Enshunada Earthquake).

Wolfram’s EarthquakeData’s first great earthquake.

I am not sure of the source of the information on this earthquake (it looks like the Centennial Earthquake Catalog, but I am not sure, since the online version of that catalog starts in 1900. You can find it listed in the NOAA Tsunami Source Database. The Wolfram latitude and longitude differ from the values in the NOAA data base, but that is not unusual, no one knows a precise location of that earthquake.

The last great earthquake in the returned from the great-earthquake search (I performed the search on 29 April, 2017) occurred on 8 December, 2016 at 21:38:48 UTC.

Wolfram’s most recent great earthquake (as of 01 May, 2017).

Trouble begins. The information for this earthquake is not the same that I find from the US Geological Survey (USGS), the International Seismological Centre (ISC), or the Global Centroid Moment Tensor (GCMT) Catalogs. Focusing on the “magnitude”, the USGS and the GCMT provide moment-magnitude estimates of 7.8. The ISC includes 22 magnitude entries measured by different research groups using different parts of the seismic wavefield. The tabulated magnitude values for this event range from 6.4 to 8.4 are listed below. You need to know something about these estimates to choose the most appropriate value. In fact, you also need to know when these measurements were made and what the practices of revision are by the various authorities for these values. The ISC provides that information by linking each value to an originID. View the ISC earthquake entry of the complexity of the information.

Magnitude Err Nsta Author OrigID
mb    6.4 0.1 47   IDC 08072814
mbtmp 6.4 0.1 50   IDC 08072814
ML    6.0 0.4  3   IDC 08072814
MS    7.9 0.1 56   IDC 08072814
MS    8.0 MOS         07864532
mb    8.0     39   GFZ 07864497
Mwpd  7.8 0.1 69   ROM 07864503
Mw    7.8         CSEM 07864683
Mb    6.8          MOS 07864683
Mb    7.1          BUC 07864683
Mw    7.7         GTFE 07864683
Mw    7.8          ROM 07864683
Mw    7.8         NEIC 07864683
Mw    8.0          JMA 07864683
Mw    8.0         PTWC 07864683
Mw    8.2          ISK 07864683
Mw    7.9         IPGP 07864510
MW    7.8    174  GCMT 08416526
mb    7.2 0.0 534 NEIC 08399223
Ms_20 8.4 0.0 443 NEIC 08399223
Mwb   7.7 0.1  47 NEIC 08399223
Mww   7.8         NEIC 08399223

Given this many magnitude estimates for one large earthquake (and these are tentative), you can easily see why problems arise for a group like Wolfram, trying to produce generally useful data. The earthquake information is confusing. But for Wolfram to pass along some sample of this information as “computable data” is a serious mistake. I can’t even figure out where Mathematica acquired the information it provided me, and I work with these data all the time.

Earthquakes with Magnitude ≥ 9.0

We can do another quick test of the largest earthquakes, those with magnitudes greater than or equal to 9.0.

Wolfram earthquakes with magnitudes greater than or equal to 9.0.

Again, the Wolfram information has serious problems. The first event, the 1687 Southern Peru earthquake can be found in the NOAA data base. The second event, located near Caracas, Venezuela was an important and well-known earthquake, but its magnitude was unlikely close to 9.6. Using the NOAA catalog, that extreme magnitude value likely originated in “Ocola, L. “Catálogos sísmicos, República de Colombia.” Proyecto de Sismicidad Andina-SISAN (1984)”. I have to think this is a mixture of magnitude and intensity for that earthquake. The other two historical magnitude 9+ earthquakes are from Colombia (1827) and near Arica, Chile (1868) have NOAA magnitudes of 7.0 and 8.5. Abe estimated a tsunami magnitude of 9.0 for the 1868 event. The Wolfram magnitudes all appear to arise from Ocola (1984).

We can argue about the older events, everything is difficult to estimate (although I believe that most seismologists would not choose the values from EarthquakeData). But we also have problems with the more recent giant earthquakes in the list. Only one of the post-1950, better-constrained magnitude 9+ earthquakes is in the list. The 1952 Kamchatka, 1960 Chile, the 1964 Alaska, and the 2004 Sumatra earthquakes are missing, only the 2011 Tohoku earthquake is included. Wolfram’s earthquake information has serious problems.

A Comparison with the USGS Catalog Data

As a final illustration of the problems, we can compare the great earthquakes since 1900 in the Wolfram database with those from the USGS online catalog. The USGS has thought much about how to select a reliable representative magnitude for an earthquake. I downloaded a CSV file containing the list of great earthquakes since 1900 included by the USGS (search page). Slight differences might be understandable if you pull information from other catalogs, but the Wolfram EarthquakeData results differ substantially from those of the USGS. That’s a serious problem.

(Top) Timeline of great earthquakes from the US Geological Survey’s online earthquake catalog compared with (bottom) the timeline from the Wolfram (EarthquakeData) catalog. The Wolfram Catalog is full of misleading magnitudes for these events and differs substantially from the more generally accepted USGS data.

Summary

No one should use Wolfram’s EarthquakeData to investigate large earthquakes, even casually.

I have only looked at the larger events, I have no idea whether smaller earthquakes share similar issues. Wolfram’s confusion is understandable. Earthquake magnitude is a complicated empirical measure of earthquake size (despite its popularity). For most large earthquakes more than one magnitude estimate exists because over the years, seismologists have used different data to make measurements. I worry that providing erroneous data as part of the Mathematica (and Wolfram Alpha) resources does more harm than good for the field. Frankly, I wish they would remove the command.

I recommend anyone working with earthquake data forego any use of Wolfram’s data and go to the source of the data, such as the US Geological Survey, the International Seismological Centre, the Global Centroid Moment Tensor Project, or other local authorities (I created an earlier post on how to search the USGS catalog directly). Mathematica is a superb tool to import and to analyze the information from these more authoritative sources.