Sonification: Listening to Big Data

Peter MerrittFunLeave a Comment

Introduction to Sonification

First off, this has nothing to do with certain weird old Hedgehog-oriented video games (for more details on that, see here). The term actually applies to the translation of often complex, massed data streams from a variety of sources into sounds. Like the associated use of colour, it uses the specialised sensory areas of our brains to easily target (approximate) key areas of interest in bulk data.

Put simply, the translation (into either colour or sound) simply associates data values with musical notes. These can then be exported as WAV or other music files and played through any media player. In a sense, it’s also what Ray does with his Fractal images – they are the visual expression of literally millions of complex, mutating number-sequences, using a colour-palette for ranges.

FYI, this is not a new phenomenon – actually listening to data and program tapes has been around almost since they were introduced, certainly among the nerdier end of the population, albeit without the full musical treatment.

Indeed, it goes back further than this to the start of commercial computing – an older, retired colleague told me that IBM (no less) had recommended this as a rough method of checking program and data storage structures back in the 60s. Essentially, in this case the smoother the sound transitions the better, as spikes and jarring notes would coincide with sudden ‘intensities’ or ‘hotspots’ in the code or data streams; potentially Rather Bad News. These factors are as true today as they ever were.

Modern Example

The most famous one to date has been NASA. They now regularly make some of their deep-space data streams (from satellites, electrical storms on Jupiter and even galactic background) available to outside ‘listeners’, and the results have been strangely pleasant:

Regularly oscillating wave-forms were one of the first clues which astronomers had when looking for pulsars, then later on planets. Remember, this is all achieved solely by associating ranges of data numbers with notes; no other ‘massaging’ has taken place. As one scientist said, it has made the natural ‘music of the spheres into a reality!

Then again, for the older end of the population, some may remember listening to the audio output of the old phone line plug-in analogue modems:

So, how can we use this?

In terms of data-analysis, this is a potentially powerful tool in association with more traditional statistical number-crunching. So, for bulk data in things like our own Cosmos, it can easily highlight targeted areas but also more subtle transitions (the wave-effect); not so easy to spot using just mathematics.

I don’t think we can use this to check our coding – directly, anyway. However, as we are supposed to doing more stress- and load-testing of our main systems (in conjunction with Dyalog), there is no reason why the numeric output of their built-in []PROFILE tool should not be fed-through a simple conversion program.

Sounds interesting to me…

Appendix A – Further Reading/Viewing

For the more academics in the audience:

And not forgetting the geeks:

Plus a toolkit: