4. WFS- Holophony

If closely spaced loudspeakers were to completely cover all walls of the playback room, the wave field synthesis would be able to restore sound pressure and particle velocity at any point inside that space. This is verifiable with the Kirchhoff- Helmholtz- Integral. However, besides the effort, in home cinema the acceptance factor would preclude the success of such an attempt.
This chapter describes a different way, a way that surmounts the horizontal limitation of the WFS loudspeaker rows, though providing excellent assimilation in home cinema. The loudspeakers are arranged only behind the silver screen. Comparable to the well-known sound projectors, which simulate loudspeakers by reflections of the playback room walls, the Holophony loudspeaker screen is faking the reflections of the sound source itself in the recording room through reflections of the playback room walls, as shown in this animation :


4.1 Near field and the “acoustic curtain”

The acoustics of the playback room undesirably change perception. Besides wearing headphones or living in an anechoic chamber, the only solution for avoiding that impact remains the near field reproduction. Two possibilities exist for this. Either the speakers are placed closely at the listener position or the radiating diaphragm is very large.
Depending on the loudspeaker grouping factor and the reverberation time of the playback room, the playback room reflections exceed the level of directly radiated sound in normal dwellings at less than one meter's distance from the loudspeakers. To including a listener at a distance of three meters into the near field, at least 1, 5 meters in diameter would be needed as a radiating surface. That's impossible for one single speaker. However, with wave field synthesis a lot of loudspeakers are working altogether as a unit. Such a loudspeaker screen realizes the known principle of the acoustic curtain . For more than a half century, this solution has been a dream for some inventors. Through the WFS principle, the approach becomes feasible today:

Each of the membrane excursions in such a speaker field is simply calculable from the distance to the virtual starting point. The bending of the resulting common diaphragm is dependent on that position and from the signal frequency. In the example, the virtual source is aligned 3 meters behind the screen coming from 48x27 loudspeakers and radiates 440 Hz::

For decreasing frequencies, curvature disappears. All the loudspeakers together are executing the motion of a plunger. If a two inch distance for 1296 transducers is used in such a barrier, the resulting diaphragm would have a size of 2. 43m (~8') x 1.38 m (~4.5'). Therefore, all listener positions in the home cinema are included in the near field. Unfortunately, we cannot increase the distance between the single speakers. Spatial aliasing effects would occur. Dependent on ingoing and outgoing angles in respect of the radiating surface, the effects arise above this frequency:

http://www.syntheticwave.de/WFS-Holophony_clip_image004.jpg

For instance, a 30 degree angle difference is causing spatial aliasing above 13.5 kHz, if the speakers are separated by two inches from each other. That's an acceptable value. It is known that our perception isn't very sensitive regarding spatial aliasing.
From a technical point of view, such a field would be feasible today; some existing WFS- setups already contain a comparable number of speakers. Actually, the development goes towards such “speaking silver screens”. The mirror sound sources in the playback room are no longer significantly supplied by sound energy, because of the extreme directional radiation of such a large resulting diaphragm. Because the first reflections constitute the source for the second, and so on, the reverberation will decrease substantially. All in all, the disturbing influence of the playback room acoustics will be substantially lessened. The need for special treatment is decreased.

4.2 Subtraction of playback room acoustics

As mentioned above, unlike the traditional procedures Wave Field Synthesis provides separate access at each component of the sound event. No other principle would be able to manipulate during playback the direct wave, first reflections and reverberation in a different manner. All conventional procedures are merging all the components inseparably already during the recording.
Thanks to that advantage, the Wave Field Synthesis opens up possibilities for eliminating the main problems of audio reproduction which have been unsolvable so far. For implementing this benefit, we use a different system view. Traditionally, the transmission chain started at the microphone and ended at the loudspeaker. However, the most significant signal change arises when the loudspeakers have done their work. In favor of a true spatial audio perception, the inclusion of the playback room properties into the system approach is indispensable. The described Holophony procedure does not aim at perfect signals in the loudspeakers. The goal is congruous signals at the listeners' ears as compared to the signals of a dedicated virtual listener position in the recording room.

To reach that goal, we have to introduce a dedicated listener point in the recording room and a default listener position in the playback room. By using this new system design, we no longer need to suppress the playback room acoustics. The dissimilar detours of the direct wave fronts and their reflections, as well as the divergent reflection factors between the recording and playback room, become equalized during the synthesis. Different for any signal component and any of the reflective surfaces, the time for synthesizing the component and its level becomes adapted in advance.

As a result, there will appear congruent signals at the position of the virtual listener in the recording room and at the real listener position in the home cinema. He/she will no longer perceive the additional playback room acoustics which are subtracted in the procedure regarding direct wave and the first reflections. For such a manageable amount of initial points, we can calculate the delay times and levels regarding each of the loudspeakers by a simple model, almost in real- time. By reason of the independent access, it becomes possible to shift each of these starting positions without a change of the perceived sound level at the listener's point. This results in a changed arrival time. If there is a delay of the first reflections regarding direct wave, an enlarged recording room will be perceived. Conversely, in case the direct wave becomes synthesized later by its own reflections, the perceived room will have a smaller scale.

For the common model must be known are the recording and playback room's geometry and her reflection factors. The playback room properties become stored during the installation of the loudspeaker wall. Recording room data becomes transmitted during the signal transmission. For what is not available, the stored library will provide apposite values. Essential in this procedure is the default listener position in the playback room and a virtual listener position in the recording room which have to be set congruently. That establishes the common model. No other way exists for coincident signals at the two positions. In the conventional approaches, including the commonly known Wave Field Synthesis, no dedicated positions exist. This different approach is represented in the next sketch. The outer room is the recording room; inside is depicted the playback room 3b. In the center is the common listener position in the recording and playback room, 3i:

The source, 3c, is restored by the frontal “acoustic curtain” almost perfectly. The mirror source of the recording room ceiling, for example, resides outside of that range. We cannot radiate directly, therefore. However, if that initial point became a shift into the range in which the loudspeaker field itself is mirrored at the ceiling, looking from the listener default position, a possibility arises. We can be abusing the reflection of the playback room ceiling for faking the mirror source of the recording room ceiling. In that intention, in the first instance the recording room's mirror source position becomes a shift at a circular path around listener position into this domain. Mirroring this position with the playback room's geometry of the ceiling delivers the final starting point (4b) of the virtual source. Now, the calculation of the run-times and levels regarding each speaker is easy in the common model. Only the distance regarding the respective loudspeaker determines the delay time and level of the corresponding audio signal.

The animation shows the restoration of the perceived altitude of the recording room in the playback room. Corresponding procedures regarding all main surfaces recreates the complete room dimension which were originally perceived in the recording room.

4.3 Combination of the model based and data based approach

Such a common model seems usable only in regard to the first main reflections. Restoring all the reflections correctly at the listener's ears over strains the model-based approach. However, there is no need for this. It was clarified by the snowy meadow model in second chapter that only the wave fronts of the direct wave and its first reflections change considerably if the listener moves in the recording room. Potentially, the alliance of the model-based and the data-based approach could solve a lot of the remaining procedural problems. Such a combination uses psycho-acoustic considerations: The reverberation contains important cues regarding the recording room properties. The fine structure of the surfaces, which determines the timbre of the room, becomes reproduced by the reverberation tail. Convolution into the impulse response of the recording room is an approved method for reconstituting the reverb very authentically. From which direction arrives the wave fronts of the reverberation tail to the listener, is of subordinate relevance for perception, though. Just like as in the recording room, the reverberation comes from all possible directions. We cannot detect the origin of second or later reflections. On the other hand, the direct wave front and its first reflections in the recording room hardly determine the timbre but contain the most important localization cues regarding the source position. By optical relationships we are skilled in determining the position of the acoustic source very accurately. Tiny distinctions in the arrival time or amplitude of these wave fronts permit its localization in all three room dimensions.

First reflections sometimes are very strong. Often, the sound pressure hardly differs from the direct wave at the listener's position. Thus, its superposition may result in deep comb filter effects. For the correct detours though, the resulting notches in frequency response are meaningful indicators regarding a valid room impression. In contrast, wrong detours, as arise in playback room in normal case, cause a different pattern of hills and notches. The resulting, misguiding cues, sometimes produce up to 20 dB of level difference.

Later reflections don't causing such pronounced notches and hills in the response. The statistic distributions of the detours are equalizing each other. The result is in a change of overall timbre, dependent on the reflective behavior of the recording room. Because the different conditions in perception are not meaningful, the direct wave and its first reflections are handled in the same manner, as commonly practiced in WFS. The impulse response-based approach delivers perfect results, but over strains the currently available computing power, especially for moving sources or for establishing three dimensional virtual environments. The calculations are feasibility much easier in the model-based approach. The next screen-shot shows the model- based calculation for one selected loudspeaker in the speaker screen. Simple vector calculation delivers the number of delay frames and the levels concerning each distinct sound source and each of its first reflections:

Delay and level calculations for two of the loudspeakers

Each speaker is controlled by the addition of the direct wave and six reflections from the main walls. Thus, for an acoustic curtain of 1024 loudspeakers, 7168 calculation results will change if the tenor takes a little step on the stage. In the case of the virtual listener position in the recording room shifting, for example in the direction toward the conductor's console , a 32 input channel system must perform 229376 new calculations, eight times per second. However, that is a bearable task for any normal PC today. Even performing all additions and multiplications inside one single audio clock seem possible now. Fortunately, according to [5] there are eight data updates per second which are sufficient for a smooth movement of the source.

The model-based calculation really constitutes three dimensions. The reverberation becomes created by the convolution of the sum signal from all input sources into the impulse response of the recording room. As proven by Wittek [6], rendering the reverberation from some different, fixed positions as plane wave fronts is completely sufficient. Like in the recording room, the reverb then comes from all-around.

For such a proceeding to be dispensable, catching an especial spatial impulse response by a team highly qualified technicians with an expensive microphone line array is needed, as performed during the data- based scientific WFS approach. Any conventional impulse response, as exists for all interesting environments today, is sufficient for the procedure. Together with an approximate geometric model, each dry recorded signal can be rendered as it would be recorded in most attractive acoustic environments around the globe.

4.4 Interactivity

The definite listener positions in the recording and playback room allows an interactive approach. Because the procedure is aimed at producing congruent signals at the default listener position in playback room as well as at the designed virtual listener position in the recording room, any change of that virtual position will lead to an according change in perception.

That simplifies the audio production process. For example, a camera would grasp one of the actors in a dancing pair. For the according audio, all sound sources would have to turn around the spectator in such a case. That is a huge effort in traditional production. However, in the Holophony approach for the producer there would remain no further task other than changing the angle of the virtual listener in the virtual recording room.

In addition, a spectator in a home cinema would have the possibility of editing this value. He can be turning himself then the music will play behind him. Alternatively, he may want to set a different position to the producer's intention. No problem. All signals become calculated accordingly, if
he walks with the help of his remote across the virtual recording room.

4.5 Compatibility

In most cases, one finds for the compatibility of the Wave Field Synthesis the argument that the sweet spot is increased significantly for the reproduction of traditional audio recordings. That's caused by the large distance of the so called "virtual panning spot" or faked loudspeakers beyond the real playback room walls in regard to the listener. For a listener inside the sweet spot, that enlargement is not really an advantage. Much more important for perception is the near field reproduction in range of the huge resulting diaphragm. Besides the nearly infinite dynamics caused from the perfect air resistance adaption , the influence of the playback room acoustics is significantly reduced.

Two wishes are in conflict with each other during conventional audio reproduction. On the one hand, we try to avoid disturbing reflections in the playback room. Some sorts of program material, for example, the newscaster or human voices recorded in a free environment, are only convincing with very dry reproduction. In that situation, the radiation of the loudspeaker screen is purely directed at the listener. Few first reflections arise because hardly any energy becomes radiated in their mirror source direction. These sources are feeding the second mirror sources and so on. In the range of the loudspeaker screen it would be possible for dry loudspeaker reproduction even in a cathedral.

On the other hand, in strongly damped home cinemas the perception of many sorts of program material becomes boring, reduced only to two loudspeakers as starting points of all wave fronts. Often we would wish for additional reflections, but those should not arrive too early. Mostly, we would desire that our playback room would be moderately reflective, but significantly enlarged. For traditional audio, the loudspeaker screen opens up the possibility for such adjusted, virtual playback room acoustics. We have the ability for producing delayed reflections. However, unlike conventional audio, we have also the possibility to avoid such reflections in normal, untreated dwellings. Besides, the synthesis provides separate access regarding those additional reflections. That's a fundamental advantage regarding the equalization of the frequency response. Usually, we equalize all signal components in common. However, except with a little loss in upper frequency range at increasingly distances, the direct wave front never changes in the spectrum, neither from the recording room acoustics nor in the playback room. Collective equalizing is always wrong, for this reason. Separately equalizing the reflections would provide much better adaption of the timbre of the recording room.

The opposite direction seems more complicated. The dry audio of the object based recordings will not sound really convincing in conventional players. Convolution in an impulse response is indispensable for a satisfactory reply. However, all audio will anyway change towards computer adapted formats in the future. The borders will disappear. Until that time, it seems unavoidable to provide conventional tracks for the traditional hardware. For the final object-based format, 32 channels would be entirely sufficient. We cannot separate more primary sources from each other. Furthermore, if we reserve eight of them for traditional audio, we can still use all the recorded material. In addition, the possibility for complementing both treatments arises.

Transmitting 32 audio tracks for normal home application seems useless at first sight. However, in normal cases only some, sometimes one single sound source alone, provide the content. The empty audio data channels will not produce traffic at an adapted standard. Thus, the transfer will be more effective than broadcasting the signal in 5.1, 7. 1 or more, sometimes completely congruent transmitting channels.

 

4.6 Stage of development

Three-dimensional WFS Near field solution is actually in stage of initial batch production at German start-up. Unlike so far realized WFS installations with horizontal loudspeaker rows - they are depending on data-based model - this application is fully utilizing advanced patented model based technologies. In contrast to restrictions of so far applied data based approach solutions, the innovative solution is scoring due to

•  making benefit out of use of room acoustics instead of need for its acoustic suppression

•  no necessity for circumferential loudspeaker installations, no disfigure of room design

•  plain solution, that can be even invisibly placed behind a video screen or curtain

•  breathtaking experience with broadly enlarged acoustic near field all over

•  solution beyond the means of today's use-oriented product design limitations:
a modular and scalable solution, that can be re-used in multiple configurations.

For more detailed information contact the Patent owner, linked below the picture.

5. Conclusion

From a technical point of view, establishing a virtual copy of a genuine sound event has become achievable. The question remains, whether this is the goal of audio production and reproduction. Critical experts argue that conventional procedures already would be able to create a nearly perfect reproduction, in some cases even better than the original. That is a valid point, especially if the original is presented under acoustically unfavorable conditions. On the other hand, the best audio equipment is still far away from creating the emotional effect during the andante of the horns at a Brahms' concert or the spatial impression from the hum of a bee, close in front of our nose.

The described solution will supply both prospects. It will be a powerful tool for constitution of spatial impression, without forfeiting any of the freedoms of artwork known from traditional audio. What is changing the game, is that the air becomes a freely three-dimensionally forgeable material in front of the loudspeaker screen. We will have total control regarding all acoustics in its widely enlarged near field. In traditional audio reproduction, we are wide of the mark from described solution.

 

Sources:

[1] Berkhout, A.J. (1988): A holographic approach to acoustic control'. Journal of the Audio Engineering Society, Vol.36, No.12, December 1988, pp.977-995.

[2] Jens Blauert: Räumliches Hören . S. Hirzel Verlag, Stuttgart 1974. ISBN 3-7776-0250-7

[3] Andreas Franck, Karlheinz Brandenburg : Efficient Delay Interpolation for Wave Field Synthesis, AES Convention 125 ( San Francisco , October 2008), Paper 7613

[4] Heinrich, Gregor; Jung, Christoph; Hahn, Volker; Leitner, Michael: A Platform for Audiovisual Telepresence Using Model- and Data-Based Wave-Field Synthesis, AES Convention 125 ( San Francisco , October 2008), Paper 7608

[5] William Francis Wolcott IV: Wave Field Synthesis with Real-time Control,Project Report, University of California Santa Barbara 2007

[6] The theory of wave field synthesis revisited. S. Spors, R. Rabenstein, and J. Ahrens. In 124th AES Convention, Amsterdam , The Netherlands , May 2008. Audio Engineering Society

 

last update 2012-08-10