Understanding FFT Scaling with Matlab (or Python, or…)

The Fourier Transform is one of the most frequently used computational tools in earthquake seismology. Using an FFT requires some understanding of the way the information is encoded (frequency ordering, complex values, real values, etc) and these are generally well documented in the various software packages used in the field. I’ll assume that you have a good handle on these issues and that you know the basic idea about using forward and inverse transforms. I want to focus on an aspect of FFT use that often confuses students, the scaling provided by the transforms. Scaling can be confusing because to make the functions more efficient and flexible, the scaling is often omitted, or it is implicitly included by assuming that you plan to forward and inverse transform the same signal.

In seismology we usually start with a signal that has a specified sample rate, \(\delta t\), and a length, \(N\). I’ll assume that we are dealing with a signal for which \(N\) is a power of two, which used to be the case whenever we used FFT’s but is less required now that computers are much faster. FFT routines don’t use a specific value of sample rate, \(\delta t\), at all, but they don’t always completely ignore it – they sometimes include \(\delta t\) implicitly in their scaling. For most cases, when you compute a forward transform, you have to scale the result by the sample rate to get the correct values (this \(\delta t\) arises from the \(\delta t\) in the Fourier Transform that the DFT is approximating).

Consider the case in Matlab. We’ll construct a unit area “spike” in Matlab by creating a signal with a single non-zero value. So we can keep track of the physical units, which are also ignored in the computation but are essential to keep track of when doing science, I’ll assume that our signal is a displacement seismogram with units of meters and that the units of \(\delta t\) are seconds. To create a signal with unit area, we must set the amplitude equal to \(1 / \delta t ~\mbox{meters}\). That is the spike amplitude has the same numerical value as \(1/\delta t\), but units of length. For a spike with an amplitude \(1/\delta t\), the area under the curve in the time domain is (using a the formula for the area of a triangle with width 2 dt):

\[Area = \frac{1}{2} \cdot 2 \times \delta t~\mbox{[s]} \cdot \frac{1}{\delta t}~\mbox{[m]}= 1 \mbox{ [m-s] .} \]

The Fourier Transform of a unit area spike should have a value of unity. Let’s see what we get in Matlab.

 >> dt = 0.1;
 >> s = [1/dt,0,0,0,0,0,0,0];
 >> shat = fft(s) % amplitudes are wrong without the correct scaling
 shat = 10 10 10 10 10 10 10 10

The spectrum is a factor of \(1/\delta t\) too large. We get the correct result  by multiplying by \(\delta t\).

>> shat = fft(s) * dt % you must scale by dt to get the correct values shat = 1 1 1 1 1 1 1 1

When we inverse transform, we have to account for the \(\delta f\) in the Fourier Transform Integral. That means we multiply by \(\delta f\). For a signal of length \(N\) and sample rate \(\delta t\), the frequency sampling rate, \(\delta f\) is

\[\delta f = \frac{1}{N \cdot \delta t}~. \]

Here’s the tricky part, Matlab multiplies the inverse transform by \(1/N = \delta t \cdot \delta f\). So we have to correct for that by either dividing by \(\delta t\) or simply remove the \(1/N\) factor and apply the \(\delta f\), which makes things easier to understand. Here’s a summary, I explicitly multiply the results of the inverse transform (ifft) by \(N\) to remove Matlab’s scaling, then I scale by \(\delta f\).

dt = 0.1;
N = 8;
df = 1/(N*dt);
% define a unit area signal
s = [1/dt,0,0,0,0,0,0,0]
% forward transform - scale by dt
shat = fft(s) * dt
% inverse transform - remove 1/N factor, then scale by df
ifs = (N * ifft(shat)) * df

Summary

The first thing that I do when I start using a new FFT, is to test the amplitudes using a unit-area delta-function approximation (which is really a triangle). The unit area time domain signal must have a value of unity at the zero frequency, and if it’s a spike, the spectrum should be flat. If you think this is not that critical, then you need to think about deconvolutions, earthquake-source spectra calculations, etc. To get the correct amplitudes, you have to apply your scaling (unless you do a forward and then inverse transform without applying any operations using physical quantities to the original signal).

There’s nothing special about Matlab here – you can use the same tests to understand the scaling in Mathematica, Python, Fortran, C, JavaScript, …

A Day-of-Year Calculator with Python

Many seismic data are stored using a numerical day of the year in place of a month and day. This is efficient for the data storage, but it leads to the need to convert back and forth from command month-day to day-of-year values. Over the years, I have seen printed tables of values that some seismologists kept nearby to do the conversion back and forth. Now, of course, we work at computers much of the day and we generally have an interactive terminal available, so I created a simple python utility to perform the conversion.

Modern python has extensive packages to deal with time quantities, but I use a simple, lightweight package that I wrote to handle the conversions (back in the mid 90’s when python had many fewer packages and most everyone was using PERL).

The date to day-of-year conversion is relatively straight forward, so there’s not much code to write – here is what I use.

doy.py

!/usr/bin/python
import sys,string
import pytutil
#
if len(sys.argv) == 4:
	
	month = string.atoi(sys.argv[1])
	day   = string.atoi(sys.argv[2])
	year  = string.atoi(sys.argv[3])
	
	print "   %02d/%02d/%04d is the %03d of the year." \
		  % (month,day,year,cja_pytutil.doy(year,month,day))
elif len(sys.argv) == 3:
	doy   = string.atoi(sys.argv[1])
	year  = string.atoi(sys.argv[2])
	output = cja_pytutil.monthdate(year,doy)
	print "   Day %03d of %4d is %02d/%02d/%04d " \
		  % (doy,year,output[0],output[1],year,)
else:
	print " "
	print "  Usage: month day year"
	print "      or"
	print "  Usage: doy year"
	print " "	

Continue reading

Parsing USGS QuakeML files with Python

The USGS provides earthquake lists in a number of formats including kml for Google Earth, easy-to-read comma-separated-value (csv) files that are quite simple to parse with any spreadsheet, matlab, Mathematica, and many unix tools. Some more information is available in the JSON formatted files, which are the preferred methods for downloading the data feeds. The XML formatted data, stored in the community-standard quakeml dialect, contain the most information. XML is not a human-readable format, but is compatible with the database centric workflows often used in modern earthquake monitoring activities. Accessing the data directly from the databases makes it more likely to have fewer translation errors (or at least consistent data).

My goal in these notes is not to explain the QuakeML (qml) format (I couldn’t if I wanted). You can read about QML at https://quake.ethz.ch/quakeml/, it’s a complex beast of a data format that can be used to store many of the different types of information used in earthquake analyses. Information items range from seismic event hypocenters and origin times (which are collectively called origins) to individual seismic wave arrival time and amplitude measurements. QML contains many schema and allows much of the seismic data to be encoded as they are measured. For example an arrival time pick can be created and stored before the earthquake location associated with that arrival is estimated. The same is true for the seismic wave amplitudes. Storing the information as it is measured is more robust and more conducive for near-real time monitoring processing flows. This helps produce quick locations and magnitudes, which lead in turn to quick earthquake alerts.

At the analysis end of the flow, the information organization can be less convenient than many old formats. For example, information that a seismologist might like to list together may be split apart in theQML file. An arrival time pick and/or measured wave amplitude may be stored as part of the event object, but a travel time residual must be associated with an event origin object. And a single event may have multiple origins – in the simplest case, one origin could be the hypocenter location estimated from body-wave arrival times, another origin could be a rupture centroid estimated using a moment-tensor inversion of long-period waveforms. These files can be complicated. Another issue with USGS QML is that sometimes you get more information, other times you get less. For example, a recent earthquake file might Include information on the station, network, channel, and location code for the waveform used for a measurement, but a file corresponding to an event from a few years older may just provide the station ID. Changing content can have you pulling your hair out when trying to extract information.

QuakeML can also simply be confusing, look at this encoding of an arrival time pick

.
.
.
        <arrival publicID="quakeml:us.anss.org/arrival/200081v8/us_3395_bpid-2001275574">
          <pickID>quakeml:us.anss.org/pick/200081v8/us_3395_bpid-2001275574</pickID>
          <phase>Pn</phase>
          <azimuth>226.495</azimuth>
          <distance>13.8121</distance>
          <takeoffAngle>
            <value>71.6</value>
          </takeoffAngle>
          <timeResidual>1.30</timeResidual>
          <timeWeight>0.6300</timeWeight>
          <creationInfo>
            <agencyID>us</agencyID>
            <agencyURI>smi:anss.org/metadata/agencyid/us</agencyURI>
            <author>manual</author>
            <creationTime>2016-12-17T10:59:33Z</creationTime>
          </creationInfo>
        </arrival>
.
.
.

Look carefully and you will see that the azimuth, distance, and time items all are specific values, the takeoffAngle item has a sublevel to store a value. I don’t know why. Variability in data storage practice adds flexibility, but also makes codes hard to maintain.

Continue reading

Converting Between Moment Magnitude and Seismic Moment with Python

Introduction

The historical tradition of using magnitude to communicate the size of an earthquake is deeply entrenched both within seismology and throughout the public. So it makes sense to have a way of mapping seismic moment into a magnitude. Kanamori (1977) developed just such a conversion, which we call moment magnitude and represent as \(M_W\). The relationship was defined in CGS units as:

$$M_w=\frac{2}{3} \log _{10}\left(M_0\right)-10.7~,$$

where \(M_0\) is the moment in dyne-cm. To convert a moment from dyne-cm to N-m (newton meters), divide by \(10^7\). We can invert the equation to convert \(M_W\) to seismic moment:

$$M_0=10^{1.5 \left(M_w+10.7\right)}~.$$

Converting back and forth from the two measures is a common task, and long ago I wrote a simple command-line driven fortran code to perform the conversion. In this note I describe two python scripts that will do the job. Like the original Fortran, both use command-line arguments, and part of the reason for this note is to provide an example of using command-line arguments in python. I use the convenient argparse package to handle all the details.

Python Code

The first version uses two command line arguments -mw and -m0 to indicate whether you want to go from \(M_W\) to \(M_0\) or from \(M_0\) to \(M_W\). Here is the code:

#!/usr/local/anaconda/bin/python
#
#====================================================================
#   MAIN PROGRAM
#====================================================================
if __name__ == '__main__':
    #
    import argparse, math
    #
    parser = argparse.ArgumentParser( \
        description='Convert between seismic moment and moment magnitude.')
    #
    parser.add_argument('floats', metavar='Value', type=float, nargs='+',\
        help='A seismic moment in dyne cm or an Mw value.')
    #
    args = parser.parse_args()
    #
    for value in args.floats:
        if value &lt;= 12:
            print "Mw = %.2f, M0 = %9.3e dyne cm" % (value,math.pow(10,1.5*(value+10.7)))
        else:
            print "M0 = %9.3e dyne cm, Mw = %.2f" % (value,0.66667*math.log10(value)-10.7)
#

Continue reading

Quick Lists of Recent Large Earthquakes with Python

Introduction

I am often interested in a quick summary of recent global seismic activity. A number of good web sites and apps can provide such information, but I often want a simple, quick summary. I wrote the following python script to execute from a command line (as in the OS X Terminal App) and to provide basic information on recent large earthuake activity. I place the script in ~/Dropbox/bin and can run it from all my Macs. I have also installed it on my iPads using the Pythonista App.

The script does nothing fancy, it simply performs a few searches on the U.S. Geological Survey’s earthquake feeds and summarizes information in three categories – great (M ≥ 8) earthquake occurrence in the last three years, major and great (M ≥ 7) earthquake activity in the last year, and strong, major, and great (M ≥ 6) earthquakes in the last 30 days.

Continue reading