Theory of Operation
The SRS 3D Stereo Process
SRS® 3D processes the signal so that the spatial cues lost during
the record/playback process are restored. Since the human hearing system
is involved, its transfer function is made a part of the system transfer
function. At the some time, SRS 3D system processing avoids an objectionable
buildup of frequencies of increased phase sensitivity and is effective
over a wide area so that the listener is not restricted to a favorable
listening position between two speakers.
In a stereophonic signal, frontal sounds produce equal amplitudes in
the left and right channels and are therefore present in the "sum" or
L+R signal. Ambient sounds which include reflected and side sounds, produce
a complex sound field and do not appear equally in the left and right
channels. They are therefore present in the "difference" or L-R signal.
Although these two signals are normally heard as a composite signal, it
is possible to separate and process them independently and then remix
them again into a new composite signal which contains the required spatial
cues that the stereo recording process did not provide. The directional
cues are mostly contained in the difference signals, so these can be processed
to bring the missing directional cues back to their normal levels. The
processed difference signals can then be increased in amplitude in order
to increase apparent image width.
The stereo signal elements left and right can be broken down into sum
(L+R) and difference (L-R and R-L) components:
L = 1/2(L+R) + 1/2(L-R)
R = 1/2(L+R) + 1/2(R-L)
Note that L-R and R-L are identical except for a 180 degree phase reversal.
After the stereo signal has been processed, the final SRS output equations are:
SRSL = K0L + K1 (L+R) = k2(L-R)p
SRSR = K0R + K1(L+R) + K2(R-L)p
where:
K0 is the L and R feedthrough gain coefficient
K1 is the L+R or center gain coefficient
K2 is the L-R or space gain coefficient
(L-R or R-L)p indicates processed by SRS perspective curve (selective
emphasis) Referring back to the transfer function of the ear in Fig. 2,
notice that the 90 degree azimuth transfer function shows increased sensitivity
at both the 500 to 1500 Hz area and the 8 kHz area over the 0 degree azimuth
transfer function. The SRS 3D Stereo system applies a corrective transfer
function, known as the SRS perspective curve or selective emphasis, to
the difference signal to compensate for these different spectral characteristics
of sounds originating in front of the listener and sounds originating
to the sides. The corrective transfer function needed to properly reproduce
sounds originating at the sides of a listener and being reproduced by
sources located in the front of a listener is shown in Fig. 3.

Figure 3 - Corrective transfer function to properly reproduce sounds
originating at 90° azimuth which are being reproduced by frontally located
sources.
The SRS perspective curve is based on the corrective transfer function
illustrated in Fig. 3 as well as compensations for other factors, such
as ear canal resonance and stereo bass compensation.
Benefits of SRS 3D
Most difference signals have a great deal of midrange information and,
since the ear has increased sensitivity to midrange frequencies and we
tend to perceive these sounds at an increased level, the difference signals
cannot be increased indiscriminately. To prevent harshness in the processed
signal, the amplitude of reproduced signals at these frequencies must
be restricted and the lower and higher frequencies increased around them.
The selective emphasis provided by the SRS 3D Stereo system controls the
processed signal's spectral content, resulting in a wider perceived stereo
image with none of the harshness or image shifting problems associated
with indiscriminate increase of the difference signal.
In addition to reducing harshness, SRS 3D processing offers other advantages
over conventional 3D processing. A live performance, ambient reflections
and reverberant fields are readily perceived and are not masked by the
direct sounds. In a recorded performance, however, ambient sounds are
masked by the direct sounds, and are not perceived at the same level as
at a live performance. The ambient sounds generally tend to be in the
quieter frequency ranges of the difference signal. Appropriate boosting
of these quieter frequency ranges of the difference signal unmasks the
ambient sounds, thereby simulating the perception of a live performance.
The selective emphasis of the difference signal also provides for a wider
listening area. The louder frequency components of the difference signal
tend to be in the mid-range, which includes frequencies having wave lengths
comparable to the ear-to-ear difference around the head of the listener.
As a result of the selective emphasis provided by SRS, the stereo image
shifting problem resulting from indiscriminate increase of the difference
signal is substantially reduced and the listener is not limited to being
equidistant from the speakers.
Mono-to-Stereo Synthesis
In addition to creating 3D images from stereo program material, it is
often desired to expand monaural signals to wider image formats.
The first step in the conversion of monaural audio signal to 3D sound
is the creation of a synthetic stereo signal. This is accomplished in
the SRS 3D Mono system through a technique that makes use of constant
phase filters. The original mono signal is applied to two banks of filters
which create two outputs with one shifted 90 degrees relative to the other.
This phase shift is consistent from 100hz to 20Khz. Due to the precedence
effect, the ear will perceive the leading signal as the direct sound and
the lagging signal as ambience information. The leading signal is thus
analogous to the L+R component of a standard stereo signal, while the
lagging signal is analogous to the difference or L-R component.
Carrying this analogy still further, the lead and lag signals are dematrixed,
using conventional sum and differencing techniques, into synthetic left
and right stereo signals. These signals are then applied to the SRS 3D
process, whose operation has already been described. Because the synthetic
L, R, L+R and processed L-R signals are generated synthetically from a
mono input, their relationships remain constant, and user control of the
L+R and L-R signal levels ("Center" and "Space") is not required. In addition
to the left and right signals, the L+R signal is output for further processing.
Because the 90 degree phase relationship between outputs of the constant
phase filters is maintained only to about 100 Hz, below this frequency
the outputs begin to converge in phase. To eliminate imbalance and diffusion
of the bass frequencies in the near field, the 3D Stereo processor left
and right outputs are high-pass filtered with a second order filter so
that synthetic stereo is produced primarily above approximately 150 Hz.
The synthetic L+R output signal of the 3D Stereo processor is then low-pass
filtered to produce a monophonic bass signal, and phase inverted to ensure
consistency with the high-pass response. Finally, the high-pass filtered
synthetic left and right signals and the low-pass filtered L+R bass signal
are combined in the output summers to produce the synthetic left and right
output signals. This results in tightly focused and centered bass in the
output.
|