Is the 9 dimensional haystack enough?

In this post I shall build upon some of the discussion from  Paul Davies (2013).  The article describes a search effort which uses data from the Lunar Reconnaissance Orbiter. They attempt to use data from this satellite with a resolution of about 50 cm/pixel, to find artifacts of ET on the Moon.

An important consideration in this search is the size of the data set that needs to be sifted through. The complete data set is expected to contain about a million frames of 500 Megabytes each, which translates to about 500 Terabytes in all. The search is for something left behind accidentally or on purpose by alien civilizations, a la  Transformers: Dark of the Moon (but mostly smaller?).

The challenge over here is that it is simply to numerous for a human or groups of humans to go over the entire data set manually, and the computer algorithms being used are not necessarily primed to look for signatures or anomalies highlighting the artificial origin. Another example of this is the Kepler mission. It has looked at more than 100,000 stars. A great search technique to find Dyson swarms, or other hallmarks of advanced ET civilization in orbit around a star. The periodic dip if caused not by a planet, but a irregular (non spherical structure) would encode information about its structure, in the residuals.  As has been discussed in Wright et al. (2016), there are a number of anomalies in exoplanet science which might be from astrophysical phenomenon or possibly from an advanced ET civilization. If we find more than one such anomaly in a system, it would be difficult to attribute it to natural sources.

Therefore, from a SETI point of view there is a LOT of information in these giant data sets from missions like Kepler, TESS, LSST, among others. Citizen Science initiatives do help in this by using human cognitive abilities in pattern recognition to pick out these anomalies and outliers; arguably better than any computer can do.  However, when it comes to automated pipelines, we should quantify their efficacy.

The point I would like to make here, is that in the era where we are transcending the radio region of the electromagnetic spectrum into the optical and infrared, we must make use of these existing big astrophysical missions and include them in our quantification of the search volume probed for ET.  However, I propose that we must add a 10th dimension to this haystack which quantifies the ability of our data pipeline to retrieve these signals IF we were to receive them. By this I mean, if there is a pipeline which is analyzing an existing database to find anomalies, the completeness fraction of that pipeline should also be quantified. By inserting artificial signals into the data, and counting the ones we retrieve, this can be done. However, that is an overtly simplistic view of this problem. This is not an easy task since we do not know the nature of these signals and can hence only hypothesize and to a certain extent – guess.

This way we could include Kepler, and other such missions in our search volume (volume searched by all SETI projects so far) using their actual efficiency and not a mere theoretical one.  This 10th dimension fraction should ideally be close to unity for most searches, however as mentioned in the paragraph above, in the absence of knowledge about the nature of the signal we can only hypothesize using our current understanding of Physics.


Addendum: 2018 – 04 – 30

After further research and work on the 9-dimensional haystack , I realize that in the original Haystack proposed by Jill Tarter in 2010, this ability to retrieve potential signals from the data is exactly what she meant in the modulation axis.

9 suffices. Phew!