This is a summary of the ideas behind and the work put into a talk given by Zane Rusk and Matthew Neal to the Penn State Student Chapter of the Audio Engineering Society. Connect with PSU AES via their Facebook, AES Section site, and PSU website.
The basic idea behind the talk and project was:
- To see how the headphone affects (or ‘colors’) the audio that is played through it
- Create an inverse filter that would allow us to listen to the headphones with a ‘flat’ response
Introduction:
In reality, no headphone (or speaker, or other audio-range transducer) plays audio through it without affecting it. This concept is summarized succinctly in the following diagram:
The frequency response, transfer function, or simply the response refers to how a system (in our case, a headphone) reacts/responds to a stimulus (in our case, an audio signal) as a function of frequency. If you played a perfect single-frequency tone at a given frequency through a headphone, a plot of the frequency response would tell you whether that tone would be boosted or attenuated – you could look for that frequency on the horizontal axis, and see whether the curve is a positive dB value (corresponding to a boost), a negative dB value (corresponding to an attenuation), or 0 dB (which corresponds to no change from input to output). Fourier theory tells us that any audio signal can be represented by a sum of pure tones, and thus each frequency component of a broadband signal (like music) will be affected by the frequency response of the system in the same way.
Referring to a headphone response as ‘flat’ means that the headphones are not changing, affecting, or ‘coloring’ the audio signal that is played through them – or that these changes are minimal. As mentioned, no headphones are perfectly flat (but you can get very close).
In the following plot of the headphone response, arrows show where the response of the headphone is not approximately flat – there is a boost above 4 kHz, and some attenuation in the low frequencies.
(Note also that the left and right headphones may not have exactly the same response, as in this diagram! If they are different enough, you might notice that the left stereo field is brighter than the right, or whatever the case may be.)
Ideally, for the application of mixing audio over headphones or for enjoying a particular audio recording without alteration, we would want a completely flat response headphone (at least over the audible frequency range, generously from 20 Hz to 20kHz), and we would like both the left and right headphones to be flat (and the same).
Note also that these are only two aspects by which you could judge a pair of headphones – this says nothing about how comfy they are on your ears, or how fatiguing they are to wear for long periods; says nothing about how fashionable they are; says nothing about their wireless or Bluetooth capabilities. It also does not include any benefit or detriment to the sound that may occur due to the quality of the device you plug the headphones into, the amount of compression on the audio files you play, etc. – this is simply about the signal going into the headphone matching the signal coming out.
If you’re very keen, you might ask – if I pay more for headphones, will they be flatter? – you might be interested in this JASA article. Or – how flat does it have to be? How big do the alterations in sound (attenuations or boosts in the response) have to be for me to detect them? You might be interested in the psychophysical (and more specifically, psychoacoustical) concept of a just-noticeable-difference (JND)…
How did we measure the headphone response?
LTI system theory tells us that if we assume the headphone represents a linear system, and represent it with the following diagram…
… we can get the headphone response, H, if we know both audio in (X) and audio out (Y). (Assuming X, Y and H are frequency domain signals, the relationship is simply H = Y/X). We picked an audio signal for X that had content across the entire frequency range of interest – the audible frequency range. We used a chirp going from 20 Hz to 20 kHz. (Other considerations in choosing an input signal can get very advanced… ***)
So, X is the chirp signal we play through the headphones, and Y is what the headphones play. How did we measure Y? Using a binaural head (we have one from B&K – see the Equipment page). The binaural head is a dummy head with microphones placed in its ears, and allows us to measure the output of the headphones conveniently & in a way that gives a realistic representation of the application (a person wearing headphones, and what they would hear).
S.A.M. (SPRAL Acoustic Mannequin) in action at the talk
With these ideas, we’ve basically accomplished our first goal – to measure the response of the headphones. Play a chirp through the headphones, measure what comes out, and calculate H.
Technical details, for those so inclined: Signal generation and processing was done on a MacBook Pro laptop. An audio input and output was accomplished via a MOTU audio interface, and Python was used for creating the chirp signal and for both playing it through the headphones and recording the output from the binaural head mics simultaneously. The signal was played and output recorded multiple times and the outputs were averaged, in the interest of getting a better signal-to-noise ratio (averaging out random noise in the measurement).
Plots of measured headphone responses
Once you have H (in the frequency domain), you can plot it as a function of frequency, to get a common representation of the headphone response like the one we’ve shown earlier. Here are some responses we measured on 4/11/19:
(Disclaimer: these were amateur measurements made by acoustics students… please don’t go knocking on the companies’ doors to show them how our data may differ from theirs…)
Links to the headphones measured: Audio Technica ATH-M30X, Audio Technica ATH-M40X, Audio Technica ATH-M50X, Corsair HS60, Sennheiser HD 6xx (which should be very similar to the HD 650)
S.A.M. looking handsome while wearing 5 of the 6 headphones tested at the talk
Basic idea behind inverting the headphone response
As mentioned, at the talk given for the PSU AES chapter on 4/11/19, we inverted the headphone response, creating an inverse filter. (When working in the frequency domain and ignoring some practicalities, the inverse filter can be thought of simply as 1/H). When filtering an audio signal with this filter BEFORE it is sent to the headphone, we can compensate for the non-flatness of the headphone response, and get a “flat response” on the output of the headphone – i.e., we get the original audio signal back without coloration due to the headphone. The idea is summarized in the diagram:
Here’s an example plot of the theoretical output to our processing (note the flatness in the audio range):
“Theoretical” because this is the response of H after combining it with 1/H, all within software – a measurement of the output of a headphone with the inverse filter applied was not taken.
Technical details, for those so inclined: Those in science and engineering (or, maybe just the scientists…) might be screaming right now at the reckless use of equal signs. There are limitations to the signal processing described here, which will be given some insight below, but let’s mention a some things that prevent Y from ever really equalling X:
- All physical systems add noise. Even if you have a highly protected system, you will have things like Johnson-Nyquist noise, for instance.
- The roll-off of the headphones, due to the fact that they are not engineered to reproduce frequencies outside of the audible range, will produce low dB values in the frequency response H; when you invert H, those very low numbers will become very high numbers. You will run into trouble if you try apply a boost that approaches infinity (see algorithm below).
- The author of this page is using block diagrams because of his background in signal processing, and he really likes them – and he thinks they lend themselves to intuitive understanding of the system at hand – but in reality X is an electrical signal flowing through an audio cable, and the output of a headphone is signal in pressure waves. If you want X to really equal Y, we must assume that X would equal Y if the output of the headphones was recorded by a microphone, so as to be returned to an electrical signal. But an additional transducer will introduce additional noise as described above, and another inverse filter would be required to undo the response of the microphone. In addition to that, the pressure wave would need to propagate through some medium from the headphone to the microphone, and that channel would have it’s own transfer function that would require undoing…
More Technical Nitty-Gritty
As mentioned earlier, Python was used to calculate the headphone response H. Due to time constraints and unfamiliarity with some signal processing tools in Python, H was brought into Matlab to create the inverse filter. When reading below, note that the frequency response of the headphone might be referred to as the transfer function (TF) of the headphone, and that the time-domain version of the TF is the impulse response (IR) of the headphone. Switching between the IR and TF is accomplished by the Fourier Transform, which is implemented in Matlab through the Fast Fourier Transform algorithm.
The algorithm to create the inverse filter, designed by Matthew Neal, was as follows:
- Import the headphone IR into Matlab
- Shift the IR in time to where the maximum occurs to 0.01 seconds
- Zero the IR after a certain point (tweaking parameter; cleans up low frequency noise which persists in the later parts of the IR)
- Center the corresponding TF, or normalize it to its mean value; makes the mean value of the TF correspond to 0 dB (no boost or attenuation)
- Invert that TF (i.e., 1/H)
- Make sure the negative frequency components of the TF are complex conjugate symmetric with the positive frequency components, to ensure that the IR is real
- Band-pass filter that TF; you make a mess when you invert the low- and high-rolloff of the headphones, and you want to clean that up
- Make the corresponding IR minimum-phase
- Truncate the IR to a certain size (tweaking parameter)
- The resulting IR is the filter IR
Real-time switching between inverse-filtered and unfiltered input audio was accomplished using Max 7 (see the Software section of the Equipment page). This allowed attendees at the talk to hear the difference between their normal headphone response and a response that approaches an ideal flat curve within the audio range.
Thanks for reading! Any questions, comments or suggestions about the above can be directed to Zane at ztr4@psu.edu!