1. Two ears - two loudspeakers? 

Many stereo freaks devoutly believe, if all components in the transmitting chain are perfect, the audio reproduction will be absolute. If two solid-state microphones grasp the sound field from the listener point within the recording room, the loudspeakers should deliver faultless reproduction. After all, we only have two ears. However, important physical interrelations remain unconsidered in that view. In traditional audio, all we can reach is tonal accuracy. Veritable spatial reproduction will never happen, because it has already been lost during the recording process.

1.1 The spatiality of recording

In the recording room, we are hit from the direct wave front and her reflections from all possible directions, dependently the starting point of the wave front or reflection. Interaural time differences (ITD) arise on the azimuth level between the listener's ears. That's an important sign, yet only one part of spatial information. That ITD´s get evaluated very accurately between around 100 Hz and 3.6 kHz. At the lower end, the differences in phase are too small, above the result becomes ambivalent, because more than one wavelength fits between the ears. Further ambivalence arises regarding front and rear. Same ITD´s occur whether the source is in front or behind the listener. The ITS´s are absolutely independent of the elevation of the source. Time detection remain unusable in that level. For this reason, we need additional cues. We use the interaural level differences (ILD) which arise from wave diffraction at the head and shoulders in the upper frequency range as well as by resonance effects at the outer ear. The singular moldings of the pinna are causing resonances, which change depending on the elevation angle of the source. Our individual listening experience, caused by the shape of head and pinna, provides an excellent determination of the source position in all three dimensions.

Our recording technology doesn't work so excellently by far. Two microphones at the listener's position receive the correct time differences, but the correct amplitude response is irrecoverably lost during recording. Spherical micros work widely independent of source direction. Cardioids are losing amplitude in upper frequency range outside of their axes. In any case, their directional effect cannot develop the notches and hills in frequency response, which arise at the human perception system, dependent on the angle of incidence, for accurate spatial detection of the sound source. Later, during playback, we leave a lot of money for the last two dB of linearity in frequency response. Nevertheless, during the recording process, we tolerate amplitude differences of 20 dB and more versus the signals at the human eardrums.

Without correct Interaural Level Differences (ILD), the recording is unavoidably reduced to the horizontal plane. In addition, further important information regarding the source direction gets lost because the strong selective effect of the Head Related Transfer Function never becomes included in the transmitting chain. That's less important for the position of the source itself. In front range ILD values remain small, at least in the horizontal level. However, the listener becomes struck f by the strong first sound reflections from above, behind, and all possible other directions. Without correct notches and hills in frequency response, caused from directional filter of HRTF, the reproduction generates wrong cues. Its amplitude response is sometimes in direct contradiction with the time caused signs. Listening fatigue is only one of the results. The spatial distribution of the first strong reflections is one of the most important factors for audio perception. The subjectively perceived volume, the speech intelligibility and the estimation regarding source distance will be strongly disturbed by wrong cues. The attempt to produce a spatial impression with later reverberation remains unconvincing. Such late reflections are providing important cues regarding the fine structure and reflective behavior of the surfaces in the recording room, yet are reaching the listener from all directions. Thus, we are hardly able to allocate any concrete direction by means of the reverberation tail.

In practice, studio productions are not normally recorded with a pair of microphones. Each signal gets its position in space according to the intention of the sound engineer during mix down. Nevertheless, the problems remain. All signals are subdivided by pan pots between the speakers. No single aspect of the important first reflections is justified at its correct starting point in such a process. The spatial impression is generated mainly by the addition of faked reverb. Such a process can sometimes effectuate correctly adapted early reflections, unfortunately emitted from the loudspeakers' direction. We cannot arrive at an allocation of those reflections from all the directions, as arises in the recording room.

1.2 Perception of phantom acoustic source

All conventional audio procedures, including the surround formats, produce phantom acoustic sources which we perceive between single loudspeakers. These sound sources aren't real sound sources. We establish the perception in the brain through the psychoacoustic connection of both ear signals. Unfortunately, such sound sources don't share the same behavior as real sound sources. We cannot hold our ear closely to them. Unlike real sound sources, the phantom source moves with the listener's position. In fact, we don't receive a sound source, but only the imagination of a sound source.

For perceiving the same direction, two loudspeakers must produce many more differences between signals as real sound sources. For example: At a 30 degree azimuth angle, the genuine acoustic source causes an interaural time difference of 0.3 milliseconds and a five dB interaural level difference. Radiated from two single loudspeakers, such values generate only approximately a 10 degree angle in Azimuth for the perceived Phantom acoustic source. For the original 30 degrees we require a five times higher, 1.5 milliseconds time, and 18 dB difference in level.

The reason for that loss in spatial impression is crosstalk between the ears. The signal of the left speaker does not arrive at the left ear alone. With a detour around the head, the wave fronts also reach the right side as well as the converse. That boosts Interaural Cross Correlation IACC, the most important value regarding our spatial impression. We sense a sound event spatially if the signal difference between our ears is as high as possible. When both signals are utterly different, IACC is zero. There is no correlation between the signals. If both signals are the same, for example with mono headphone playback, the IACC value is one. At 0.3 a sound source, in a free environment, reaches the ultimate possible difference if the wave fronts are reaching us from a circa 55 degree azimuth angle. In acoustical famous venues a lot of the first reflections come from that range. For large angles regarding the direct source direction, such reflections are “acoustic attraction”, inducing goosebumps at the horns, for example, during a Brahms' concert.

Conventional loudspeaker reproduction cannot create such an experience because the right box is only approx. 30 degrees off from the median axis. However, more tight spacing causes more correlation and is, consequently, less attractive. In the center of the concert hall the ceiling reflections are hardly contributing to an improvement of spatial impression. Such reflections in line with the center are counterproductive, normally. Well-educated architects know that and try to lead such wave fronts sideward. Because of the crosstalk between the ears, our stereo loudspeakers cannot reach an IACC value below approximately 0.6. Therefore, the realistic concert hall experience remains out of reach in traditional audio. All experiments for producing sound sources outside the loudspeaker base by inverse phase and other tricks are in vain and dilettantish. During playback, the playback room alone is able to induce sound sources in that range by its own reflections. In the most cases, however, such reflections are very distracting because the detours are very different in reference to the recording room.

Without the correct spatial distribution of the first reflections, important cues drop away for estimating distances regarding the direct sound source. In any case, the phantom acoustic source resides between the loudspeaker boxes, not before and not behind that line. That becomes noticeable if we are changing listener positions. For example, if the violin is positioned right in front of the timbale in the center of the concert hall, we hear both instruments from the same direction. If we move then near the right wall of the concert hall, we will perceive the violin clearly left from the timbale. For the phantom source reproduction at home, however, both instruments remain in congruent positions, independent of the listener's position. Both sound sources feature the same starting point. This demonstrates that the phantom sources are aligned between the speakers, never behind them and never in front.

It has to be allowed that in the real world we also cannot estimate distance regarding the sound source directly. The signal differences between both receptors we use in audio for the determination of direction are not, as with the eyes, used to estimate the distance. For audio the most important indication is the volume; loud sources are nearer normally. However, loud phantom acoustic sources cannot also reside more closely to the listener as undirected radiating loudspeaker boxes because these cannot create a better direct wave / diffuse field relation than the loudspeaker itself in the playback room. Realistic proximity effects are thus unreachable s. That restriction is underrated widely because nearby sources, or reflections, are supremely important for the spatial impression as well as for the emotional impact of the perception.

1.3 Two rooms in one recording

The acoustics from the concert hall are stored in the recording. Nevertheless, during reproduction the playback room superimposes its reflections on the recording. In the main audio range, the loudspeakers normally radiate uni-directionally. This causes strong playback room reflections. The level of those unwanted signals depends upon the reverberation time of the playback- room and on the bunching degree of the loudspeakers. In normal dwellings, the amount of diffused sound surmounts the direct wave portion at a distance of less than one meter from the loudspeakers. The predominant part of the waves, which we receive from the loudspeakers, is caused by the playback room acoustics!

The problem in this matter is less the reverberation tail. We can tolerate additional reverberation, as long as the amount remains lower as the reverberation level within the recording room. More distracting are the strong early reflections from the playback room. Because its size is mostly much smaller than the concert halls, the detours are peerlessly shorter. Especially smooth surfaces are causing strong reflections, whose magnitude is often hardly below the direct wave level. The superposition of both signals causes comb filter effects up to 20 dB in the frequency response.

Certainly, such comb filter effects arise in recording rooms due to the same principles. The frequencies of the notches and hills are dependent from the time differences of both signal components. Because our perception significantly relies on learnt patterns of stimulus, we relate the resulting frequency response to the correct room impression. By the same way, also the playback room is causing notches and hills in this matter. Sometimes, if the playback room is not too different in respect to the recording room, its reflections actually cause a very authentic sound field with a superb spatial impression. On the other hand, if the playback room is very distinctive from the recording venue, we have to damp completely his misguiding cues. In that situation, the reproduction becomes boring, less attractive musically and the reduction to the horizontal plane becomes very annoying. Furthermore, the other way, allay its impact by more directed radiation of the loudspeakers is wrong. By that way, the advantage of improved tonal accuracy is achieved at a loss of spatial impressions.

In that connection, the experiments of Acoustic Research in the late 1980´s in the Carnegie Hall are very interesting. Good speaker boxes were placed on the stage, like the artists are p positioned normally. Each speaker was driven from a dry recorded audio of its artist. For the spectators, it was hardly possible to distinguish the reproduction from the genuine event. Obviously, the spatial distribution of the sound reflections is the most important requirement for true spatial audio, at least much more important than the last 2dB linearity in the frequency response. The improved directive effect of the loudspeakers or damping the playback room avoids misguiding cues, but cannot engender correct reflections. But, that's an essential condition for true spatial audio. The phantom acoustic sources of conventional procedures aren't able to restore the correct spatial behavior. Simply boosting the amount of separate transmitting channels cannot be the solution for the described fundamental problems. Only new approaches, like Ambisonics or Wave Field Synthesis would have the ability to surmount those problems. The virtual acoustic sources of the Wave Field Synthesis are independent of the listener's position. In principle, such an approach provides the possibility to restore the genuine sound-field physically. Nevertheless, we need new thoughts on how to make use of the advantages. The next chapter will describe a model for the correct distribution of wave fronts.



last update 2013-11-15