The networked shape of future audio
Radio is a hot medium. In the history and theory of communications, that means it mostly engages one of our senses, hearing, and it doesn’t require a great deal of conscious effort to understand or enjoy. Some people now call it background media - or peripheral - in the sense that it can be consumed without monopolizing our attention. Peripheral media is the one truly contemporary communications mode because it matches our splintered attention span, the parallel multitasking. It enhances our day without intruding too much on the commute, the jog, the meal, the useless wait, the small talk, the ride home, the waking hours at dawn.
“Audio” is what we now call the broad category of music, podcasts, books, lectures and many more variations of digital content. Radio became all that, diluted and subsumed by the Californian tech wave of on-demand and streaming business models. Whatever is left of the traditional industry and its century-long rise is now being progressively dismantled by three megatrends that will play out over the course of the next decade. All of them computational in nature, networked to the core.
While the past 40 years tell a frustrating story in audio, the 100-year history is very different. Throughout the 20th and 21st century, audio has continually discovered new delivery mediums, formats and monetization models. This began with the launch of the radio broadcast in 1927, which blanketed the country in audio, extended with the transistor radio of the 1950s, which made audio truly portable and private, through to satellite, digital stores, and Spotify streams. Today, the audio category is 40 times bigger in real terms than it was exactly a century ago, two times as big as it was 50 years ago, and up 30% since 1994.
Matthew Ball: https://www.matthewball.vc/all/audiotech
These trends will transform our perception and consumption of audio, in ways that can be traced to a combination of current technical innovations, artistic impulses, and corporate road maps. Like the ones before it, this paradigm shift in our relationship with recorded sound will disrupt norms and behaviors, politics and culture, business and pleasure.
Spatial audio: towards an aesthetics of full immersion
“Modern stereophonic technology was invented in the 1930's by British engineer Alan Blumlein at EMI, who patented stereo records, stereo films, and also surround sound. Like all communications media, stereo is an illusion, a simulation of the location for various sound sources — voices, instruments, etc - within the original recording.” The technology went through a multitude of iterations and adaptations over the course of the next forty odd years before finally becoming the standard in motion pictures and in music around the 1970's. Commercial and social challenges stood in the way of mainstream adoption of this new capability to experience sound. Like so many other successful products in history, stereo was plagued by changing agendas and unpredictable trends, unbearable production and distribution costs. Then there was the fact that nobody had really asked for it.
In a classic example of how long and hard the entertainment industry is able to push when it really wants to change the media landscape around us — and through the sheer will of countless gifted individuals and the generational might of corporate financing — stereo eventually became the default experience in movies, music and radio for anyone born after 1980 - - and along the way changed pop culture forever. For most of us over 30, stereo is how the world sounds.
A similar story is unfolding with spatial audio, the so-called 3D version of stereo. This new miraculous illusion starts with a method of recording sound that uses two microphones, arranged with the intent to simulate the three-dimensional stereo experience of being present in the room with the performers or instruments. Formerly known as binaural sound and revered mostly by engineers, artists and niche enthusiasts, spatial audio has become a beacon of marketing light in the emergent markets of immersive ambient computing.
Take AirPods. Analysts estimate Apple sold between 14 million and 16 million AirPods in 2017. By 2021, that number was 120 million. Based on current market data, if AirPods were an autonomous business with an estimated market capitalization of around $32 billion, it would be bigger than the Bank of China or Fujitsu. This ubiquity of audio distribution channels is undisputed. Apple is at the forefront of the large-scale deployment of spatial audio capabilities, both in soft and in hardware, for the supply and the demand side of the market. Its advertising departments are tasked with producing the accompanying pop culture discourse.
Out of the box, an increasing catalog of spatialized audio tools and experiences will become available on the iUniverse and beyond. The creative possibilities of a new hot medium will define much of the narrative and discursive novelty over the next five to ten years, with radically immersive genres and transmedia perspectives being discovered and experimented with. Presence, that holy grail of mixed reality designers, is the promise at the heart of this new cultural diffusion and the immersive capabilities of spatial audio shall bring about its deliverance in the most unexpected ways.
You know the story: every media revolution creates a serious generational clash, then an unavoidable gap. No different this time. Pornography, video games and young men are leading the way as the doors of perception crack open again onto a century of dreamlike stories. For many of us, all of these new sounds will be intolerable, superfluous and gimmicky, made for and by ignorant noisy kids. But also as usual, it’s likely that by mid-century the virtues of stereo will be the new rage again, and the nostalgia wheel will keep on turning for the added benefit of those who shall inherit the earth.
The kinetic beauty of audio interfaces
The second trend follows logically from the ubiquity of AirPod-like devices and has already expanded the crowded vocabulary of user experience design. Tap once for pause, swipe gently for volume control, double tap to skip or resume — mass adoption is a powerful thing and it will take care of all kinds of minute details for you. Like effectively teaching millions of global users a whole set of new gestures used to control and manipulate invisible media objects. To take calls with a tap and be immediately transported to the cyberspace of the phone conversation while you walk, drive, work out. To react to a notification ping with a subtle emotional nudge, then transcribe a mental note with a whisper. To locate at will, in the fuzzy cloud of personal space, the specific tones associated with every important folder, to act upon them while your gaze drifts unconcerned across the blue horizon.
Waiting for a defiant startup or a wary incumbent lies this unfulfilled promise: our eyes should rest outside the screens when we’re in the matrix. Interaction with a flow of texts, emails or calls does not require vision to be successful. Calendar dialogues and journal updates can now be maintained with competence by and with your assistant chat bot of choice, who obviously also handles reservations of any sort. You talk your way through the metaverse, proud and clear-eyed.
The masters of video game design have long understood the power of augmenting reality without headsets. “It’s easy to believe in ghosts because they are invisible”, and thus the evolution of playable narratives will reward those experiences that best align with the peripheral affordances of spatial audio. The creative potential is mouthwatering. Game characters that call you up unprompted, automated workflows in the background filling in the details with actual real time data make the conversation the best fiction you ever had, tap twice to allow the system to clone your voice and use it to generate more story dialogues. Would you like to listen to a custom recap playlist tonight?
Soundbite streams and audio collages
Digital media can sometimes be thought of as a liquid substance like water, ever flowing and transmutable when subjected to temperature changes. The internet is now old enough to be accurately characterized, at least regarding some of its basic properties: network effects, the Pareto principle, bundling and unbundling, the freemium model, “whatever happens to musicians will happen to everyone else”. Digital audio has so far remained incredibly basic, monolithic even, when it comes to its technical production and distribution possibilities. RSS feeds are no longer enough.
Low-cost, ubiquitous RSS-based distribution of podcasts is doubtlessly responsible for the medium’s growth to date. Without it, it would have been too hard to find Pod Save America or Serial, to share episodes and to build podcasting habits. But RSS is also a limiter. The RSS standard allows for only a single version of a file to be distributed (which cannot be updated) and almost no audience-side data is returned. This means there’s no detailed listener or listening data (where the audience skipped, whether they completed a file, etc.), no potential for dynamic ad insertion or programmatic advertising and no interactivity.
Matthew Ball: https://www.matthewball.vc/all/audiotech
Digital media wants to be malleable, fragmented, tracked, accessible, composable. It wants to flow uninterrupted and in real time, and so it will be with audio when it finally surrenders to the trend. We don’t want to share an entire episode of that podcast, just that thing she said. And it better come in a neat animated sound wave so that it looks great both on my WhatsApp groups and on IG stories. And I should be able to craft my own little snippets playlist, and then share that. Metrics. Microsubscription plans embedded in the player itself that can enable a creator economy of remix and aggregation of soundbites into thicker or longer streams. Endless recombination at any scale.
Whatever the smallest possible unit of content is, it will thrive on the internet if it can be both atomized and socially networked. It happened with text, images and video — it will happen to sound.
The open questions remain the same: who will step forward and precipitate the change? Is this a decentralized open-source adventure or a corporate device? How interoperable will these tools and end-experiences be, and how much will that cost? What does it mean to have LLMs access and digest the sound of the world? What are the terms of service for games that remix reality, and what happens when we all wear earplugs to connect to the web and each other? Is there an off button?
Radio is the source of all this, the hot archetype of a new world enhanced by an endless custom soundtrack.
Sources and recommended reading