The blueprint of the spatial sonic field/ chapter 3
3. Wave Field Synthesis
The Wave Field Synthesis, also referred as wavefield-synthesis or WFS, is a spatial audio reproduction procedure. Their perception no longer remains dependent of psychoacoustic phantom sound source perception as like the conventional audio procedures. The sound field becomes reconstructed physically. For this purpose, the synthesis emulates nature like wave fronts according to Huygens principle by assembling of elementary waves. A computer synthesis moves each solitary loudspeaker membrane,mostly arranged as an array around the listener, just in that moment, if the wave front of a virtual point source reachs their point in space.
3.1 Mathematical Base
The WFS- procedure was developed in the late 1980´s from Prof. Berkhout at the Delft University of Technology. The underlying mathematical base is the Kirchhoff- Helmholtz Integral (KHI) . It express, if known sound pressure and particle velocity at any point at the surface of a source free volume, the sound pressure at any point within this volume is determined. According to Rayleigh II, the sound pressure at the point A within a half-space is determined, if only the pressure distribution on a plain is known. On both sides of this plain an acoustic field occurs. If the rear sound suppressed, a half-space radiation result.
3.2 Physical principle
3.2.1 Virtual acoustic sources
For true spatial audio must be avoided the migration of the sound source according to the listener's position. As mentioned above, phantom acoustic sources cannot produce true spatial audio for this reason. For coincident acoustic perspective, proper Doppler- effects and realistic deep impression we need virtual sound sources, stable at its starting point like real sound sources. One possibility for establishing such a virtual acoustic source would be assembling of a lot of loudspeakers at a spherical surface. Obviously, we will perceive the origin of radiated wave front independently from the own position always in the centre of such a sphere:
However, there is impossible, distributing such bowls at each of the source and reflection starting points in the “snowy hayfield model”, mentioned in second chapter. Fortunately, the Wave Field Synthesis Principle can produce an unlimited amount of virtual acoustic sources from elementary waves in a loudspeaker line.
3.2.2 Elementary waves
Christiaan Huygens discovers, each point of a wave front considers the starting point of an elementary wave. More as 300 years ago, the Dutch Mathematician explains the diffraction effects by that principle. That principle is applicable to each wave propagation, light waves as well as sound waves. Huygens Principle is one of the most important cognition in the range of physics. In range of acoustics today the knowledge delivers the possibility, restoring genuine like sound waves from such elementary waves:
In this animation, we consider the holes in the baffle, respectively the loudspeakers, as such elementary wave starting points. As long as dimension and spacing of the holes remain small compare by the sonic wavelength, the sound pressure will not differ between both sides of the hole. The superposition of a sufficient number of such elementary waves completely restore the genuine wave front. All we need is the dry recorded source signal and the distance concerning each starting point of the elementary waves.
Unfortunately, the sonic field in the recording room is not established from the direct wave front alone. The major fraction of the sound energy contained in reflections. For true spatial audio, we cannot radiate the reflections alone from main source direction, as often done during conventionally audio. The difference in the direction of the first reflections regarding the direct wave starting point deliver the most important cues regarding source distance and distances of the recording room walls. The huge amount of later reflections which arrange the reverberation tail are less important regarding the direction but provide information's regarding fine structure and properties from recording room surfaces.
The WFS - loudspeaker arrangements able to create more as one virtual sound source. Her signal content independently and may originate by different sources. In case of congruent signal content becomes radiate from diversified positions, we will perceive the signal as a reflection of the main source signal. As describe in the second chapter, the genuine sound field in recording room is established from a huge amount of such starting points with same signal content. If we reconstruct all those positions, the spatial sound field would be completely recreated from single, dry recorded mono audio. The recording room acoustics don't work differently. The main difficulty for restore the genuine sonic field is, appointing all of the starting points of all the reflections in the recording room.
3.2.3 The model based approach
Wave Field Synthesis provides two different ways in this matter. The more simply method is the model based approach. According to the mirror source model, the starting points become calculated alone from recording room geometry. The calculated distance of each virtual sound source position regarding each of the Loudspeaker positions determinates runtime and level. The reflection factors from walls are including at this calculation as well as the directional radiation of the primary source. Such procedure is practicable for restore direct wave and first reflections in the recording room, but the huge amount of discrete reflections in the reverberation tail makes impossible the correct reconstruction of the complete sound field by the model based approach in practice.
3.2.4 Impulse response based approach
By that reason, the common practice in the scientific institutes, which refining Berkhout´s idea, the application of the impulse response based approach. In prearrangement of the transmitting process become captured the spatial impulse response of the recording room. In that purpose, a line array of microphones arranged in the recording room comparably as the loudspeakers arranged in the playback room. For capture the spatial impulse response, a short impulse induced on the later position of the primary sound source, catch from the microphone array. The impulse will hit the nearest microphone at first. The align loudspeaker on the align position will radiate the audio signal ahead all other loudspeakers during playback. The other Microphones in the recording room later strike in turn from the impulse.
In the scientific institutes which pursuit Berkhout´s idea, the application of the impulse response based approach is the common practice. The convolution of each loudspeaker signal into the recorded impulse response of the assigned microphone will recreate the direct wave and all its reflections in the recording room from correct starting points. [1]
Nevertheless, in practice is impossible recording the spatial impulse response on every microphone position in the recording room for all possible positions of the sound source. Thus, the measuring results must become to extrapolate and interpolate during playback for all different positions. This calculation needs to include also the mirror source positions of the reflections. Above and beyond, the microphone array poses unequal acoustic length regarding loudspeakers in an environment at the different temperatures in the playback room. That would cause loss in upper frequency range, as far as the different propagation speed isn’t included in the calculation. Apart from, the loudspeaker positions different the microphone positions in normal case. Especially for moving sound sources, the amount of calculation tasks hardly handable in real time at the currently available computing power.
3.3 Procedure advantages
In principle, though, the wave field synthesis has the ability for produce a virtual copy of the genuine sonic field. All sound sources and all of its reflections in the recording room become restored virtually at its correct starting points,at least inside the horizontal plane of the listener. That's different conventional procedures, which attempt to transmit whole rooml information as the time and level differences between some separated audio channels. The synthesis from the dry recorded source signal, in the same manner as the sound source establishes the spatial impression in recording room by producing distributed reflections, is the more genuinely way to true spatial audio.
As described in first chapter, we are not able to grasp the true spatial distribution of the mirror sound sources in the recording room in means of time and level differences between a set of microphones. Moreover, also experts are agreeing the purists, which consider mono as the best audio because of its clarity. Parted channels are less substantionally, more "ghostlike" in compare.
The natural and coherent sound of mono recordings remained essentially preserved, if we complete its reproduction by the correct reflections. Thus, wave field synthesis recreates the true spatial impression of the recording room from such mono tracks.
The volume based solution is not bound in a narrow sweet spot, the wave field synthesis restores the sonic field. Any change of the listener position in the playback room is causing the same change in perception as the listener would move accordingly way across the recording room. That marks the true spatial audio reproduction! It would be never possible at the psychoacoustic based phantom source detection, because the source position migrates dependently the listener position. Hence, the virtual acoustic sources, produced from a sufficient amount of elementary waves, pose the same behavior as real sound sources. The Loudspeaker itself no longer remains referring point.
Apart from, Wave Field Synthesis provides the ability for align the virtual sound source in front of the loudspeaker alignment. In the principle animation wouldn't appear any difference for delay times and levels, if the virtual source behind or in front of the microphone row. Thus we would perceive the starting point in any case behind the speakers. However, the delay times become inverted by the “Time Mirror Approach”, the loudspeakers would produce concave wave fronts. In that case the virtual source appears in the focus point inside the playback area. We can walk around in certain degree yet.
But the most important advantage hardly mentioned in the scientific publications. In conventionally audio all signal components, as the direct wave, first reflections and the reverberation, merge together inseparably in common signal. Thus isn`t possible, handling each of that components in a different manner during playback. In Wave field Synthesis however, we synthesize all those component during playback. Thus become possible differently changing levels and frequency response for each single component of any sound source. But much more important is, we can appoint the time for synthesizing each of these components. Turn to account this facility hardly possible in the impulse response based approach. However, which breathtaking possibilities arise in the model based approach for this reason will be described in the forth chapter.
3.4 Remaining problems
3.4.1 Horizontal restriction
Wave Field Synthesis isn't limited on plane in principle; the procedure would able for restore the sound field in all three room dimensions. But at the impulse response based solution the available computing power for 3D audio would be beyond of reach until today. Besides, covering all playback room walls by loudspeakers hardly a usable approach in practise.
Looking for practicable solution, the developers abandoning the elevation level representation in compromise. Reducing the speakers upon single line around the listener was an acceptable solution, realizable in the nineties already. Our detection in azimuth is mainly based by time detection, which becomes reconstructed perfectly by such horizontal loudspeaker lines.
Such solutions are possible today with hundreds of loudspeakers without unbearable problems. Yet the horizontal limitation remains clearly audible, especially in damping environments, which need for suppress the playback room acoustics for such WFS- approach. Other procedures, like Ambisonics or Vector Base Amplitude Panning (VBAP) have shown a really three dimensional reproduction of the sound event is essential.
3.4.2 Disturbing playback room acoustics
Besides the acceptance factor for such loudspeaker rows all around, the loudspeaker rows cannot solve the problem of the disturbing additionally playback room acoustics in transmitting chain. In order to produce alone the acoustics of recording room, the playback room acoustics must get completely suppressed.
Horizontal rows of loudspeakers doesn't really produce parallel wave fronts, they radiate cylindrical waves. Such wave fronts lose 3 dB of its volume every time the distance is doubled. Since the listener is relatively near the speakers, the increasing volume of the nearby speakers becomes disturbing. Over and above loosen sound energy comes back from the playback room walls in case of insufficient damping. Especially the strong playback room ceiling reflection is hardly avoidable at horizontal cylinder wave radiation.
3.4.3 Aliasing Effects
The Kirchhoff- Helmholtz integral describes an unlimited amount of elementary waves. In Practise yet, the number of loudspeakers will be limited. As like as at each quantisation in audio, that causes aliasing effects.
Inside the playback area, depend from audio wavelength dots of higher level change by dots of a lack in magnitude across the room. At one dedicated point, the notches and hills have a very small bandwidth. Fortunately yet, such effects in perception less disturbing as in the measured frequency response curve.
The difference in notches and hills magnitude depends from distance between the elementary wave sources and listener and source position regarding the radiating loudspeaker alignment. For aliasing free reproduction a distance of less than one inch would be the need. Some improvement for given amount of Loudspeakesr describe in DE102009006762A1.
3.4.4 Truncation effect
As far as the radiating loudspeaker arrangement not closed around the listener, the ends of the radiating surface cause the “Truncation Effect”. As visible in principle animation, at this ends suddenly no further elementary waves contributing sound pressure. That will change the resulting superposition of all elementary waves suddenly as well, a shadow wave arising.
In certain degree that effect is avoidable by decreasing the outer speakers in level. As far as the virtual acoustic source aligns behind the loudspeakers, the shadow wave arrives at listener later as the direct wave front. But, as far as the source inside the playback room, the shadow wave first arrive at first, what very disturbing for perception.
3.4.5 Concave wave fronts
Another problem for such virtual sound sources inside the playback area the wrong ITD´s of concave wave fronts. All surfaces of real sound sources in nature are curved convex; we have no listening experience inside a sound source. Thus, the time difference cues of such wave fronts produce utterly odd perception.
If the listener situated between the radiating loudspeakers, such misguiding cues accrue. Two different ways for solve the problem described in the protected solution EP1637012, or in the DE 10 2006 054 961 A1 Application, which is no longer protected.
3.4.6 Parallax problems
The last mentioned proposal usable for solving another problem of the acoustic blueprint. We have the possibility for producing virtual sound sources inside the spectator’s area, but we cannot produce the sound source of a connected picture at that point. The described way for combine the physically principle by psychoacoustic faked source position discover the breathtaking possibilities of the wave field synthesis principle.
3.5. Compatibility
The wave field synthesis principle a object based approach. We have to transmit the pure, dry recorded audio (content) and in addition the Data regarding the recording room properties. In range of computer such object based transmitting standard since long time because of its efficiency. At the German Fraunhofer Institute was developing the MPEG4 standard, joined for such object based audio broadcast.
Unfortunately many of the traditionally components cannot play that standard at the time. On the other hand, traditional audio may played in WFS loudspeakers, but the fundamental advantage for producing true spatial impression of the recording room are going lost during such reproduction. We can feed the channals in virtual panning spots, virtual loudspeakers far beyond the real playback room walls. That lessens the influence of the listener position in playback room, the angels and levels regarding the distanced loudspeakers hardly changed at different points in playback room. The enhanced the sweet spot nearly cover the whole playback room. But, the perception is traditionally phantom source based audio including all of the disadvantages of the traditionally procedures, described in the first chapter.
3.7. Subjective impression
Impression ever subjectively, but I want to describe as neutral as possible my own impressions from different occasions for listen the wave field synthesis loudspeaker rows:
Until today the installations remain clearly apart from the goal of congruent perception regarding the genuine sound event. Most notably audible was the reduction onto the horizontal plane. The loudspeakers in the rows align at least 20 cm apart each other, though aliasing effects weren’t really disturbing. More audible was a tonal inaccuracy, especially loss in upper frequency range. The loss increase with the number of demonstrations in some cases, possibly some solvable problems for including the room temperature in to the calculation was responsible.
On the other hand, the spatial impression incomparably better as possible in all traditionally procedures. The positions of the sound sources are absolutely stable. Never in traditional audio will possible so clearly estimate the distance regarding the sound source. No loudspeakers remain audible, the sound moves seemingly independent from all loudspeakers, outside and inside playback room.
The position remains constant in playback room, even if the listener moves across the area. The source level change accordingly the distance regarding the virtual sound source. Just very near the virtual sound source inside the room perception become indifferently, without concrete source position.
The most of the problems seem solvable in foreseeable time; first plants of tightly assembled two dimensionally indicate promising results. Such speaker fields would be applicable for put into practice a WFS- “Holophony” approach, which should be the final goal of the procedure.