The USGS provides earthquake lists in a number of formats including kml for Google Earth, easy-to-read comma-separated-value (csv) files that are quite simple to parse with any spreadsheet, matlab, Mathematica, and many unix tools. Some more information is available in the JSON formatted files, which are the preferred methods for downloading the data feeds. The XML formatted data, stored in the community-standard quakeml dialect, contain the most information. XML is not a human-readable format, but is compatible with the database centric workflows often used in modern earthquake monitoring activities. Accessing the data directly from the databases makes it more likely to have fewer translation errors (or at least consistent data).
My goal in these notes is not to explain the QuakeML (qml) format (I couldn’t if I wanted). You can read about QML at https://quake.ethz.ch/quakeml/, it’s a complex beast of a data format that can be used to store many of the different types of information used in earthquake analyses. Information items range from seismic event hypocenters and origin times (which are collectively called origins) to individual seismic wave arrival time and amplitude measurements. QML contains many schema and allows much of the seismic data to be encoded as they are measured. For example an arrival time pick can be created and stored before the earthquake location associated with that arrival is estimated. The same is true for the seismic wave amplitudes. Storing the information as it is measured is more robust and more conducive for near-real time monitoring processing flows. This helps produce quick locations and magnitudes, which lead in turn to quick earthquake alerts.
At the analysis end of the flow, the information organization can be less convenient than many old formats. For example, information that a seismologist might like to list together may be split apart in theQML file. An arrival time pick and/or measured wave amplitude may be stored as part of the event object, but a travel time residual must be associated with an event origin object. And a single event may have multiple origins – in the simplest case, one origin could be the hypocenter location estimated from body-wave arrival times, another origin could be a rupture centroid estimated using a moment-tensor inversion of long-period waveforms. These files can be complicated. Another issue with USGS QML is that sometimes you get more information, other times you get less. For example, a recent earthquake file might Include information on the station, network, channel, and location code for the waveform used for a measurement, but a file corresponding to an event from a few years older may just provide the station ID. Changing content can have you pulling your hair out when trying to extract information.
QuakeML can also simply be confusing, look at this encoding of an arrival time pick
. . . <arrival publicID="quakeml:us.anss.org/arrival/200081v8/us_3395_bpid-2001275574"> <pickID>quakeml:us.anss.org/pick/200081v8/us_3395_bpid-2001275574</pickID> <phase>Pn</phase> <azimuth>226.495</azimuth> <distance>13.8121</distance> <takeoffAngle> <value>71.6</value> </takeoffAngle> <timeResidual>1.30</timeResidual> <timeWeight>0.6300</timeWeight> <creationInfo> <agencyID>us</agencyID> <agencyURI>smi:anss.org/metadata/agencyid/us</agencyURI> <author>manual</author> <creationTime>2016-12-17T10:59:33Z</creationTime> </creationInfo> </arrival> . . .
Look carefully and you will see that the azimuth, distance, and time items all are specific values, the takeoffAngle item has a sublevel to store a value. I don’t know why. Variability in data storage practice adds flexibility, but also makes codes hard to maintain.