Modern Digital Recording & Playback - Site de Cyrille PINTON

related theory behind the process. The second will focus on the quantization of these samples and some of the math- ematical manipulation that goes into the ...
163KB taille 3 téléchargements 253 vues
Modern Digital Recording & Playback David Hyre, PhD

About the author Dr. Hyre is a research molecular bioligist at the University of Washington. In addition to his abundant knowledge of molecules, he has a profound passion for music and the highest quality reproduction of it. In addition to his efforts for the University of Washington, David enjoys designing and building loudspeakers in the time he is not trying to unlock other mysteries of the universe.

INTRODUCTION In the consumer market, a majority of discussion about digital audio focuses on the playback of the signal encoded on digital media such as CD, DAT, and MiniDisc, while little time is spent discussing how the signal got there in the first place. However, many of the concepts that apply to digital playback also apply to recording. As the following discussion will show, the process of playback is essentially the reverse of the recording process. In fact, the relevant concepts can be easier to comprehend when applied to digital recording and can create a greater appreciation of the playback process. Therefore, this first part will focus on the process of sampling an analog signal and some of the related theory behind the process. The second will focus on the quantization of these samples and some of the mathematical manipulation that goes into the process. Together, sampling and quantization comprise digital recording. The third and final part will focus on the process of reconstructing a faithful reproduction of the original analog signal from the digital recording “made” in the previous two articles. Much of digital signal theory applies to all three processes, so that the theory presented will facilitate the discussion of playback.

DIGITAL RECORDING The digital recording process can be isolated into two separate steps, sampling and quantization, which occur sequentially. The term sampling refers to the capture of a signal at discrete points in time and assumes perfect representation of the signal at each point, independent of how that sample is represented. Quantization refers to the

representation of those individual samples by a finite set of numbers and assumes perfect sampling, with no error in sample timing such as jitter from irregular sampling intervals or aliasing of out of band signals from an inadequate sampling rate. The sampling of a signal does not necessarily imply quantization of that signal. For example, “bucketbrigade”-type delays are available which temporarily store signal levels as analog voltages in buffered sample-andhold capacitor circuits, thus avoiding digital quantization. These stored voltage samples are then played back by sequentially passing them to the output. Sampling theory applies to these circuits as well because they chop the signal into discrete pieces. Therefore, the following discussion will treat sampling and quantization separately, although it will be shown that in reality the two processes are linked by physical limits related to the limited performance of the electronics employed in digital recording.

SAMPLING The concept that an analog signal can be faithfully represented at a finite sampling rate is based on one of Fourier’s theorems. This theorem states that any waveform can be broken into a combination of sine and/or cosine components. The Nyquist sampling theorem adds to this by stating that a fixed sine wave can be fully represented by two samples per cycle, for example, at the peak and trough of the waveform. The values of the samples represent the amplitude of the wave, while their separation represents the frequency of the wave. Two samples per cycle are merely a minimum; the same sine wave is just as well represented by more samples per period. Thus the result of combining both theorems is that any time-varying waveform can be fully represented by samples taken at a rate 2N, equal to twice the frequency of the highest frequency sine component N. This frequency, 2N, is known as the Nyquist frequency. Sampling a signal of bandwidth F (containing frequencies from 0 to F Hz) at a rate 2N less than 2F results in the “folding” of frequencies between N and F back into the spectrum, creating “aliases” of the true frequencies at false

© Copyright 1996 David Hyre

www.fullswing.com

2 locations within the sampled spectrum. This can be demonstrated easily, as shown in Figure 1.

Signal

1

0

-1 0.0

0.5

1.0

noise outside the audio spectrum from an analog signal to be sampled at 44.1 kHz, this filter must have less than ~1 dB attenuation at 20 kHz but at least 96 dB attenuation at 24.1 kHz (44.1 kHz minus 20 kHz; the noise between 22.05 and 24.10 kHz will be folded back into the 20.00-22.05 kHz range). It is apparent that this requires a complex filter built to exacting standards with narrow tolerance components. Even the best analog filter displays phase and/or gain anomalies near the cutoff frequency, or “knee”. Thanks to advancements in electronics, there is a better way.

Time

Figure 1a

Signal

1

0

-1 0.0

0.5

1.0

Time

Figure 1b

Signal

1

0

-1 0.0

0.5

1.0

Time

Figure 1c

The 2 Hz wave (“c”) appears to be a 1 Hz wave (“b”) when sampled at 3 Hz (“a”) due to the sampling rate being less than 2*2=4 Hz. Folding, or aliasing, becomes important when one considers that there is random noise outside the audio spectrum stretching practically to infinity which would be folded back into the sampled spectrum by sampling at a finite rate. This noise can be removed by a traditional analog filter. However, in order to remove all

The availability of extremely fast and relatively inexpensive analog-to-digital converters (ADC’s) and computer chips allow a different approach to be taken in removing unwanted noise from the spectrum; oversampling. This approach, diagrammed in Figure 2 (page three), employs a simple low-order analog filter, a sampler running at a frequency greater than 2F, and a digital filter. In this approach, the sampling rate is set far above the original Nyquist frequency of 44.1 kHz, usually to an even multiple of 44.1 kHz. The ratio of the actual sampling rate to the Nyquist frequency is referred to as the oversampling factor. 4x oversampling at 176.4 kHz is common in ADC’s, although there are some that run at many megahertz. As in the previous method of sampling at 44.1 kHz, high frequency noise in the incoming signal is first attenuated by an analog filter. However, due to the high sampling rate, the filter does not have to reach -96 dB until (2N)-(20 kHz), or 156.4 kHz in a 4x oversampling system. Thus, there is a large window between 20 kHz and 156.4 kHz for the transition region of the filter, so the filter can be relatively simple. A simple filter is less expensive and causes fewer glitches in the signal. In addition, the large window between 20 kHz and 156.4 kHz means that the filter “knee” can be moved well away from the audio frequencies, thus eliminating the effect of filter phase and gain anomalies from the audio portion of the spectrum. Once the spectrum has been bandwidth-limited to 0-156.4 kHz, the ADC samples the resulting waveform at 176.4 kHz, recording the audio signals as well as all the noise between 0 and 156.4 kHz. The ADC must then convert these analog samples into digital numbers by quantizing them; this process is covered later in this text. The resulting numerical samples are then passed through a digital, or computa-

3 Oversampling

Filter

Amplitude

1

0.1 10

100

1000

10000 Frequency

1 ) 2 ) 3 )

100000

Noise removed Anlg. filter

Simple analog filter passes this portion.

Oversample and digitize this portion. Digital filter passes only this portion.

1000000

Noise

Remaining noise removed.

Figure 2

tional, filter which removes the noise and noise aliases from the spectrum above 22.05 kHz (N) without introducing any errors into the audio spectrum. This digital signal can then be “resampled” at 44.1 kHz (2N) by “decimating” the samples, or taking only the appropriate points, such as every fourth sample in a 4x system. This can be done because three of every four samples in the signal are entirely redundant after filtering. Consequently, all the audio information acquired by the fancy oversampling ADC is retained in the “common” 44.1 kHz CD medium. Often, the process of filtering and decimation are accomplished at the same time by computing only every fourth output sample. The digital filters employed in this process have many benefits, some of which are discussed below. Digital filters, sometimes known as interpolating filters, are computational devices which remove unwanted frequencies from the digital spectrum in a way roughly analogous to the way analog filters work. In fact, both can be viewed as devices which create frequency-dependent time and phase delays in the signal and then add the modified signal back to the original. Analog filters modify the signal using capacitors (and inductors in high-level systems such as speakers), while digital filters alter the signal by computing

weighted averages of multiple samples extending both forward and backward in time, relative to the center or “current” sample. Thus the digital filter multiplies each sample in a range about the center by certain weighting coefficients, which are calculated to provide the required response. These modified samples are then added together to create a new sample representing the output signal. This process, also known as convolution, is diagrammed in Figure 3 (page four). The use of samples which are forward in time gives rise to the “pre”-ringing observed before each signal in the step response of digitally filtered signals. Likewise, the use of past samples (backward in time) give rise to the more traditional “post”-ringing in the same signals. Pre-ringing in a square-wave step response can be considered as a type of pre-emphasis added into the signal to create the desired response. It is actually caused by the removal of highfrequency components from a signal that naturally contains infinite frequency components. As Fourier’s theorem states, any shape waveform can be created from sine wave components of various frequencies. To see the spectral distribution of the sine wave frequencies comprising a given signal, one only has to perform a Fourier transform

4

Raw Samples

S- 5

S- 4

S- 3

S- 2

Next S- 1

Current S- 0

Previous S1

S2

S3

S4

Filter Coefficients

x C- 4 +

x C- 3 +

x C- 2 +

x C- 1 +

x C0 +

x C1 +

x C2 +

x C3 +

x C4 +

O0 Current

O1 Previous

O2

O3

O4

In

Filtered Samples Ou t

O5

Figure 3

(FT) on the signal to convert the time-domain signal into a frequency domain spectrum. If one performs an FT on a square wave, the resulting spectrum shows frequency components extending to infinity in the shape of a “sinc” function, (sin X)/X (Figure 4b).

the frequency-domain spectrum (Figure 4c) and then perform an inverse FT to see the resulting waveform in the time domain. Figure 4d presents the results of this process. It is readily apparent that simply removing the upper frequency components of the square wave results in the pre-ringing

Figure 4a

Figure 4c

120

7000

100

6000 5000 Amplitude

Percent

80 60 40 20

4000 3000 2000 1000 0

0

-1000 -2000 0.0

-20 0.0

0.2

0.4

0.6

0.8

1.0

0.2

Time

0.4

0.6

0.8

1.0

0.8

1.0

1/Frequency

Figure 4a

Figure 4c Figure 4d

Figure 4b 7000

120

6000

100 80

4000

Percent

Amplitude

5000 3000 2000 1000

60 40 20

0 0

-1000 -2000 0.0

-20

0.2

0.4

0.6

0.8

1.0

1/Frequency

Figure 4b Fourier’s theorem also works in reverse, as does the FT, so that one may remove the high frequency components from

0.0

0.2

0.4

0.6 Time

Figure 4d phenomena, entirely without the use of any digital filtering as described above. Mathematically, however, the two

5 processes are identical; removing spectral regions in the frequency domain is computationally equivalent to convolution (weighted averaging) with a filter function in the time domain. Figure 4 shows this equivalence in graphic form. Unlike the 1-10% tolerance found in the components comprising analog filters, the weighting coefficients in a digital filter can in principle be computed with infinite precision. In truth the precision of the coefficients are limited by the number of computer bits used to represent them, but even the 16 bits found on today’s CD’s allow a “tolerance” of less than 0.0015% (2-16). Such high precision is never found in electronic parts such as capacitors and resistors. Often, more than 16 bits are used to describe the filter coefficients because it is common to use numerical processors which compute data with more bits, up to 24 in some cases (0.000006% tolerance). As in analog filters, there are a number of different digital filter types, each described by a mathematical transfer function which specifies their response characteristics. In fact, the transfer functions of equivalent analog and digital filters are the same, allowing the digital coefficients to be computed from the analog transfer function. Filter coefficients can also be calculated from frequency response curves using Fourier transform methods, creating new types of filters that are either difficult or impossible to implement in the analog realm, even with perfect components. Filters derived from both analog transfer functions and Fourier methods are known as finite impulse response (FIR), or non-recursive, filters. FIR digital filters compute their output based solely on the samples of the input signal. The term “finite” describes the fact that their response is not perfect, requiring a transition region between pass-band and stop-band, just like analog filters. This transition region can be made extremely narrow depending on the number of filter coefficients and their values. Analogous to the higher frequencies sharpening the “knee” in the square wave described above, a larger number of coefficients in a digital filter can result in a steeper transition between passand stop-band. There exists a second class of digital filters called infinite impulse response (IIR), or recursive, filters. These filters

use not only the input samples but also the calculated output samples created by previous computational cycles, and thus have no direct equivalent in the analog domain. As their name implies, the step response of IIR filters can be made near perfect, but these filters are far more difficult to design and tend to be for highly specialized purposes. Most of the digital filters used for audio are of the FIR type. To summarize, the process of sampling an analog signal is composed of five separate steps (see Figure 2, page three). The signal is first passed through an analog filter to remove high frequency noise, so that it does not “fold” back into the audio spectrum. The filtered signal is then sampled at a frequency many times higher than the top of the audio spectrum, at least 20 kHz higher than the analog filter cutoff. The resulting samples are then quantized. The numbers representing the samples are passed through a computational filter that removes all frequencies outside of the audio band, using digital filter coefficients which provide a near perfect passband frequency response and cutoff. Finally, the numerical samples are decimated, removing redundant information in order to reduce their number to the familiar 44,100 samples per second found on today’s compact discs.

QUANTIZATION Integral to the discussion of quantization is the number system employed to represent the samples. It is convenient to discuss digital audio in a binary number system similar to that used for CD's and other digital media. The CD standard specifies 16 bits for each sample; this is referred to as "word length". This can be thought of as a "single-sided" signal from 0 to 65535 (=216-1) with 32767 (=215-1) as center (positive/negative encoded as a DC offset), or as one bit to represent the sign of the signal and fifteen to represent the integers 0 through 32767. Thus, the maximum signal would represented as 1111 1111 1111 11112, and the minimum signal would be 0000 0000 0000 00012. The "Most Significant Bit", or MSB, is the first bit on the left. This single bit represents half of the total amplitude of the signal (the DC offset). The last bit on the right is the "Least Significant Bit", or LSB, whose value is 1/32768 of the MSB. This allows a total of 65535 different numbers to represent different signal levels. The theoretical dynamic range of this

6 system can be calculated from these numbers; in decibels, 20*log10(65535) = 96.3 dB. 40 35 30 dB

25 20 15 10 5 0 Frequency

Figure 5B

Dithered Error Spectrum 40 35 30 25 dB

Inherent in any quantization is a loss of information related to the finite resolution of the fixed word length used. It is clear that the numerical value of a sample will have to be truncated to the LSB. This is most easily accomplished by rounding the value to the nearest binary number, causing the maximum error between the quantized sample and the original signal to be +/- 1/2 bit. The error introduces noise into the signal at any level and is called quantization error or noise. Thus, very small signals cannot be represented well because the size of the errors approaches the size of the actual signal. One can imagine an input signal oscillating between +1/3 and -1/3 which will be quantized to a DC signal of 0. The error will then be +/- 1/3 and equal to the original signal but with inverted sign. The ratio between the level of the maximum possible signal and the quantization noise represents the theoretical limit of the signal to noise ratio, and is equal to the ratio of the MSB to half the LSB. This is 96 dB in a 16 bit system.

Undithered Error Spectrum

20 15

The spectrum of the quantization noise is related to the signal being quantized, and a totally random, uncorrelated input produces a random, uncorrelated white noise spectrum. However, audio signals are highly correlated signals because of their oscillatory nature. They produce a correlated noise spectrum which is a function of the sampling rate and the frequency of the signal (Figure 5b). The quantization error of the above "test" signal oscillates between -1/3 and +1/3 as the signal oscillates between Raw Signal Spectrum 40 35 30 dB

25 20 15 10 5 0

10 5 0 Frequency

Figure 5c +1/3 and -1/3 and is thus highly correlated. The correlated noise is perceived mainly as a form of intermodulation distortion and is quite unpleasant. This correlation can be effectively removed by adding random noise of amplitude 1, known as dithering (Figure 6d, 6e, & 5c). The addition of +/- 1 dithering to a +/- 1 "test" signal results in a signal whose error is entirely dependent on the random dither and whose signal still remains in the shifting oscillatory levels of the resulting noise, thus producing white noise for even the smallest representable signal (Note: small errors in the noise spectra are due to limited simulation data size). Dithering increases the overall noise but decorrelates it from the signal, resulting in a slightly noisier but undistorted signal.

Frequency

Figure 5a

Further improvements in the perceived S/N ratio can be achieved through the manipulation of the truncation or

7 Undithered Quantization Error

Analog Input

2

2

1

Signal

Signal

1

0

-1

-1

-2 -1.0

0

-0.5

0.0 Time

0.5

-2 -1.0

1.0

-0.5

Figure 6a

1.0

Dithered & Quantized

Undithered & Quantized 2

1

1

Signal

Signal

0.5

Figure 6c

2

0

-1

-2 -1.0

0.0 Time

0

-1

-0.5

0.0 Time

0.5

-2 -1.0

1.0

-0.5

Figure 6b

0.5

1.0

Figure 6d

Dithered Quantization Error 2

1 Signal

rounding function in a methodical way. This is known as noise shaping. Noise from lower frequencies can be shifted to higher frequencies, resulting in a higher total noise level, but the higher frequency of the noise is less unpleasant and can be filtered out at a later stage of recording and/or during playback. One example of a modified truncation function is one in which odd numbered samples are rounded up to the nearest binary level and even samples are rounded down. This produces an error signal which alternates between + /- 1/2 at the sampling frequency. The quantization noise spectrum is consequently shifted upward toward a frequency equal to half the sampling rate, reducing the noise at lower frequencies. More complex truncation functions can be devised by basing the direction in which the sample is rounded on the error in previously quantized samples. For example, Sony's Super Bit Mapping manipulates the truncation procedure to shift noise away from 3 kHz and 12

0.0 Time

0

-1

-2 -1.0

-0.5

0.0 Time

0.5

1.0

Figure 6e kHz, where the ear is more sensitive, to the 15-20 kHz range, where the ear is less sensitive. This produces a signal

8 which is perceptually more quiet by approximately 6 dB even though the total noise is higher.

decoding process before being sent to the output jacks for amplification.

Quantized and noise-shaped samples become sensitive to further manipulation because of the noise shaping. Calculations as simple as digital gain adjustment can not only cancel the benefits of noise shaping, but actually increase the total noise. Digital filtering after noise shaping can have similar effects. After such manipulations, the previously reduced noise floor returns to its original level, but the frequency-shifted noise is not decreased. This effect is caused by the additional quantization step inherent in any calculation. An increase in signal level is as simple as multiplying each sample by the gain factor, but each new value must be truncated again to a discrete binary level. Dithering and noise shaping can be applied again, but the total noise is increased with each application. Many of these effects can be avoided if the word size is increased with each manipulation. This has important implications for the resampling (and hence requantization) performed in transfering 20 bit master tapes to the 16 bit CD format.

The first step, reading the data off of the CD, is now fait accompli. CD-ROM's dominate the computer software distribution industry and have an infinitesimal error rate, something like 1 in 1012. This is equivalent to one bit being read wrong after playing 100 CD's, i.e. not often. This is possible due to error correction codes and data redundancy which allow reconstruction of small amounts of missing data. Add to this the ability to go back and re-scan to replace error-ridden, irreparable data, and the error rate drops further. Thus, the impact of the CD reading mechanism on music should be nonexistent. Bob Harley, in his book Complete Guide to High-End Audio, says that "the transport or interconnect doesn't change the ones and zeros in the digital code" (p. 227). I take this to mean that he agrees.

PLAYBACK To begin with, the data must be read off of the compact disc, either within an integrated CD player or a separate CD transport. The digital data are checked for integrity and then sent on to the digital to analog converter, often referred to as the DAC (before the advent of integrated single-bit and other modern converter chips, DAC referred specifically to the electronics that turned the digital voltages into analog voltages, usually a few resistors and an opamp; I will call this chip the "D/A converter" and the whole unit "DAC"). This can reside in the CD player or in a separate DAC unit. If separate transport and DAC units are used, the data must be encoded for transfer between the units and decoded upon reception, usually in the SonyPhillips Digital Interchange Format, or S/PDIF. Once in the DAC, the data are manipulated computationally in a number of different ways, depending on the manufacturer's implementation of the conversion process. This can include oversampling, interpolation, digital filtering, and noise-shaping. The data are then run through the digital converter, and the resulting analog signal is filtered to remove high-frequency noise from the digital encoding/

Of course, I do not mean to suggest that all CD transports will sound alike. There is always the possibility that a transport may cause ground loops or inject RF/EMI noise into the DAC, unless the interconnect is optical. We also assume that our CD's are free of fingerprints and dust, which can confuse poorer transports. About the only other way a transport can affect the audio signal is by poorly embedding the clock signal in the S/PDIF signal sent to the DAC unit. This signal encodes both the digital audio data bits and the clock signal, interleaved in a special format which encodes the clock in the up-down transitions between audio data bits. Poor timing of the signal bits from which the clock is reconstructed can cause jitter in the DAC output signal, leading to complex harmonics of the jitter frequency in the analog output. Note that this is not a likely problem with integrated CD players since the clock is local and thus the signal does not need to be encoded in S/PDIF. We will see later how these errors can be avoided by logically decoupling the two processes of reception and clocking. Once the signal has reached the DAC unit, it is almost always subject to oversampling and digital filtering. The 16bit digital signal at this point looks like a series of steps, whose sharp corners and fast transients between digital levels contain large amounts of high-frequency harmonics

9 that must be removed. Early DAC's accomplished this using very steep cutoff analog filters just outside the audio band. These filters had to have nearly zero attenuation at 20 kHz and almost perfect attenuation at 22 kHz, to -96 dB in ~0.1 octave. This is difficult to accomplish and was described before in Part I. As in recording, digital filtering allows us a better way to perform the filtering. The digital filtering is done as in recording, but in reverse, and possesses all of the advantages over analog filters mentioned previously. As you may remember, in recording the signal was sampled at a rate many times the minimum necessary (e.g. 8 times), then passed through a computational filter that performed a type of mathematical smoothing to remove the frequencies above 22 kHz. This makes most of the digital samples redundant (e.g. 7 of every 8), which are simply dropped to reduce storage requirements. In playback, oversampling digital filters replace the dropped samples using a similar mathematical algorithm, which amounts to interpolating points between the existing samples. In our example, this means interpolating 7 new points between every pair of original samples. Note that inherent in this method is an increase of the effective sampling rate, which shifts the step harmonics to inaudible frequencies. Along with the increased rate and computationally derived new samples comes the possibility of increasing the number of bits from 16 to 20, 24, or even 32 to get higher resolution worthy of the higher speed and increased information. In fact, oversampling methods are so effective that they can be expanded to an almost absurd degree, known as "singlebit", "bitstream", or "MASH" methods. As the original signal data are oversampled more and more, the samples become increasingly redundant and the noise drops 3 dB for every doubling of the sampling rate. This means that bits can be dropped without loss of information or increase in noise, e.g. 1 bit for every 4x of oversampling. Thus, these methods oversample the signal by huge amounts, dropping "redundant" bits as they go. The resulting samples are then fed to a very simple D/A converter at an incredible rate; more on this later. Various interpolation methods exist, linear being the simplest and those based on Fourier series being among the

most complex. As in recording, the digital filter is used as a low-pass filter to remove the high frequencies based on an averaging function. Linear interpolation reduces the size of the steps between successive samples but is not very good at removing the high harmonics present in the original signal, since that signal is composed of sine waves, not triangle waves. Fortunately, digital filters similar to those used for recording can be used on the linearly interpolated samples to remove the high frequencies by the same method. These filters are most often built up from Fourier series and are thus inherently based on sine waves. They are very effective at removing the step harmonics. One can compare the mathematical interpolations in a digital filter with the electrical interpolation done by the earlier steep analog filters, whose capacitors essentially perform an exponential interpolation on the samples. There are many differing views on the type of digital filtering that should be used in audio DAC's. One common signature of DAC's using digital filters is pre-ringing in the step response (see Part I). This is caused by the high-level signal of the step being "propagated back in time" by the interpolation algorithm to samples before the main part of the step. Some manufacturers, notably Wadia, use a different filtering algorithm optimized for its time-domain response, which prevents pre-ringing. Such removal of the pre-ringing is not necessarily natural, as it seriously affects the frequency response and phase delay of the signal. A step response that is still square without pre-ringing suggests that high frequencies are still getting through the filter, while a smooth step response without high frequencies OR ringing suggests that the filter is overdamped. As shown in Part I, simply subtracting the high frequencies from a step leads to pre-ringing. This effect can also be imagined by thinking of the step function as being composed of one big sine hump, the low frequency component, overlaid with smaller high frequency waves that sharpen the corners into a square step. It can then be seen that the center of the first low frequency wave is NOT coincident with the center of the first high frequency wave, and that forcing these into alignment can lead to phase and group delay problems. This is not to say that time domain optimization has no advantages, just that it has serious consequences that can outweigh many of the presumed benefits.

10 When the original samples are oversampled to a higher rate, one effect is to spread the quantization noise over a larger frequency range, far above the audio range, which later can be filtered out. It is at this point that dithering noise can be added to the samples to decouple the quantization error spectrum from the signal spectrum. During the re-quantization inherent in the oversampling and digital filtering, it is also possible to spread the noise unequally to different frequency bands by altering the truncation or rounding of the calculated sample to the number of bits being used to represent the newly generated samples, analogous to the process employed during recording. This is known as noise-shaping and goes by such names as Multistage Noise Shaping (MASH) or Super Bit-Mapping. Not only can much of the noise be moved out of the audible range, but also the distribution of the noise remaining in the audible band can be made nonuniform, redistributing it to regions in which the ear is less sensitive. Single-bit converters do this many times over, at each step of oversampling/bit-reduction. At this point in playback, the original samples have been oversampled, digitally filtered, dithered, noise shaped, and bit-reduced and are ready for conversion to analog voltages. This is accomplished at a basic level by the same means in all converters, with a resistor. The earliest D/A converters were simple summing amplifiers, consisting of a series of resistors whose values were related by powers of 2, each representing one of the many bits. For example, 1, 2, 4, and 8 k-ohm resistors could be used to represent the values 8, 4, 2, and 1, respectively (they use current, so it's proportional to 1/R, not R). It is easy to imagine that getting 16 values exactly right is quite difficult. These converters were known to suffer greatly from errors in the resistor value of the highest bit, or Most Significant Bit (MSB), which had to be accurate to within 216 (0.015%) or it would make the smallest bit (LSB) irrelevant. Such inaccuracies can cause non-linearity and zero-crossing distortion, resulting in harmonic distortion. A/D converters with more bits make this problem worse. A later configuration, which is often used in today's multi-bit converters, is called an R/2R ladder. This uses only two values of resistors, or one value singly and in parallel or series, which reduces sensitivity to manufacturing spread because the converted value of each bit is dependent on the other resistors, instead of only one.

The MSB resistor in this type of converter can be trimmed to remove any residual zero-crossing errors. The single-bit converters take this trend to an extreme, using only one resistor (or a few) to represent the high-speed one-bit data stream. Thus, with single-bit converters, there are no MSB errors. As we shall see, the trade off is greater timing error sensitivity, known as jitter. The analog voltages from the D/ A converter are passed through a gentle analog filter which removes the remaining high frequencies, "mechanically interpolating" them one final time. This is equivalent to the analog filter used to remove high frequencies above half the sampling rate at the input of the digital recorder. Analog de-emphasis can be applied utilizing analog circuitry at this point. It is also possible to perform deemphasis in the digital domain as a part of the digital filtering. However, care must be taken in this approach because it adds another level of re-quantization to the process, which can degrade any noise shaping that has been done to the signal. Increasing the number of bits can alleviate this problem. This is to be contrasted with the analog filter, which does not suffer from quantization errors, but usually has worse component tolerances which change over time. As in recording, the digital approach is better, but only if the right precautions are taken. The accuracy of the time at which each digital sample is fed into the D/A converter and converted is of tantamount importance to the quality of the final signal. If the sample is converted at the wrong time, new harmonics are introduced by the irregular timing. It is analogous to sampling at irregular intervals during recording. This is known as jitter, and is currently the subject of much debate, mysticism, and knee-jerk engineering. As mentioned earlier, in separate transport/DAC systems the clock used for D/A conversion timing is recovered from the S/PDIF signal. If this clock recovery is not perfect, it will feed an inaccurate clock signal to the D/A converter, generating jitter. Such imperfect clock recovery can be caused by numerous things, such as poor encoding, noise on the interconnect cable, poor interconnect bandwidth, and poor impedance matching between cable and receiver, to name a few. Some manufacturers have responded with a brute-force mentality by sending the clock signal separately from transport to DAC over a dedicated clock cable, independent of the

11 sample data. Thus, the consumer must buy both transport and DAC from the same company. However, noise generated by the digital circuitry within the DAC can pollute the clock signal as well, which cannot be solved by additional clock cables. Increased sample rates only compound the effect of jitter, because 5 nanoseconds of jitter in samples spaced by 23 microseconds (44.1 kHz sampling) represents a very small percentage error (0.02%), but 5 ns of jitter with 89 ns spacing (256x oversampling) represents a much larger error (6%). Harley states that the worst transports have ~500 picoseconds of jitter, which produces harmonics at a level of about -97 dB. This is nearly inaudible, being close to the noise floor and nearly equal to the IMD products. He also states that the worst receiver chips are poorer than this, producing 3-5 ns of jitter and signal up to -80 dB. I propose that there is a much simpler solution to this and other problems, one that is already available on portable CD players costing only $129 (that includes transport, jitter reduction, and DAC!). This is sometimes called asynchronous sample rate conversion, except that in this case the rate conversion is 1:1, making "asynchronous" the key. This is the "logical decoupling" mentioned earlier. Jitter does not change the value of the samples, it just changes the timing between them. Thus, if the samples are temporarily stored in a small memory buffer, they can be "clocked out" by a newly synthesized, highly accurate local clock. This clock would only have to be loosely synchronized to the main clock, since the memory buffer would allow time for the original clock errors to be averaged out. This is known as "electronic shock protection", available on a number of inexpensive portables. These have enough memory to store 10 seconds of music, which in the car is used to read ahead on the CD, giving the player time to go back and find its place after skipping caused by a mechanical shock. In the home setting, in addition to asynchronous D/A conversion, this time could also be used to go back and reread erroneous data from the CD, or to skip smoothly to another track without any delay. Putting such a circuit in a DAC would make it completely immune to transport jitter. Such a DAC would sound exactly the same with every transport, from a modified DiscMan to the most esoteric of transports. This also suggests that if a DAC sounds different with different cables or transports, its engineering is not to be trusted,

since the ideal DAC would be immune to such differences. It would not need an extra clock cable, thus allowing it to interface with any brand of transport. It would sound the same with all digital interconnects (except for those causing ground loops), from a $5 Radio Shack video cable to a $500 esoteric digital conveyance. Thus, there is really no excuse for jitter in any of today's DAC's. Many of the above topics have interesting implications for the testing of DAC's. The interaction of the many computational algorithms involved could possibly cause spurious results. For example, the use of dithering and noise-shaping in making the digital test signal could add noise or even harmonics to the final signal if the DAC under test also employs dithering and noise shaping, since the second round of computational filtering could mostly undo the benefits of the first round. These interactions can affect the measurement of S/N ratios, linearity, IMD, and other DAC characteristics. The types of test pulses used can also make a large difference, since a perfect digital square wave is easy to generate but contains lots of inaudible high frequencies that we are not interested in testing. Similarly, the use of step response to judge a DAC may not be as meaningful as using frequency-filtered step functions and comparing the input and output waveforms. The judging of CD transport quality can be affected by the DAC used if the DAC is anything but perfect. In conclusion, we have seen that digital recording and playback involves a large number of processing steps, each of which can degrade the signal if not properly implemented. In addition, we've seen that the wrong combinations of processes or applying them in the wrong order can have the same effect. Despite these sensitivities, I hope that I've also shown the promise of using the proper implementation of these methods and the possibility of reproducing the original analog signal with even better fidelity by using them. This can all be achieved at competitive costs with today's technology.