Audio Technology / A/D conversion and digital audio signal transfer / Improving audio digitization...

There are a few simple, yet important ways in which the quality of the digitial representation of an analog waveform can be improved.

In principle, if frequency response were the only issue, there would be no advantage in moving to formats with higher sampling rates. However, the evidence is otherwise. Direct psychoacoustic comparisons of the same source material, recorded and reproduced at 44.1 kS/s, 96 kS/s 192 kS/s show that there is an advantage in going to the higher rates - it sounds better! The most common comment is that such recordings have "better spatial resolution". What mechanism can be at work? It seems unlikely that we have all suddenly developed ultrasonic hearing capabilities.

Energy dispersion and anti-alias filtering.

Sharp filtering inevitably causes a ringing transient response - the effect is referred to as the Gibbs phenomenon. The ringing contains energy, and although the energy in the input transient is concentrated at one time, the energy from the anti-alias filter is spread over a much longer time - the audio picture is "defocused". We might argue that the energy is ultrasonic, but this is certainly not the case at 44.1 or 48 kS/s - our bandwidth constraints mean that to get good anti-aliasing, we must filter as fast as we can, and only pass the audio bandwidth. A high sample rate gives us the extra bandwidth to contain the ringing (energy defocusing).

The audio DVD standard.

In addition to improved anti-aliasing and energy defocusing handling, the 96,000 Hz sample rate is part of the new, emerging digital audio standard, used in present-day recording studios, consumer PCs (e.g., the new Sound Blaster Audigy cards), and the audio DVD format.

For the sampling theorem to apply exactly, each sampled amplitude value must exactly equal the true signal amplitude at the sampling instant. Real ADCs do not achieve this level of perfection. Normally, a fixed number of bits (binary digits) is used to represent a sample value. Therefore, the infinite set of values possible in the analog signal is not available for the samples. In fact, if there are R bits in each sample, exactly 2R sample values are possible. For high-fidelity applications, such as archival copies of analog recordings, 24 bits per sample or a so-called 24-bit resolution, should be used. The difference between the analog signal and the closest sample value is known as quantization error. Since it can be regarded as noise added to an otherwise perfect sample value, it is also often called quantization noise. The effect of quantization noise is to limit the precision with which a real sampled signal can represent the original analog signal. This inherent limitation of the ADC process is often expressed as a Signal-to-Noise ratio (SNR), the ratio of the average power in the analog signal to the average power in the quantization noise. In terms of the dB scale, the quantization SNR for uniformly spaced sample levels increases by about 6 dB for each bit used in the sample. For ADCs using R bits per sample and uniformly spaced quantization levels, SNR = 6R - 5 (approximately). Thus, for 16-bit encoding about 91 dB is possible. It is 20 to 30 dB better than the 60 dB to 70 dB that can be achieved in analog audio cassette players using special noise reduction techniques. A 24-bit encoding yields a theoretical SNR of 138 dB, which is only limited by the electronics of the hardware itself.

Simply put, aliasing is a kind of sampling confusion that can occur during the digitization process. It is a direct consequence of violating the sampling theorem. The highest frequency in a sampling system must not be higher than the Nyquist frequency. With higher audio frequencies, the sampler continues to produce samples above Nyquist at a fixed rate, but the samples will create false information in the form of alias frequencies. In practice, aliasing can and should be overcome. The solution is rather straightforward. The input signal must be band-limited with a low-pass (anti-aliasing) filter that provides significant attenuation at the Nyquist frequency. The most "archetypal" anti-aliasing filter will have "brick-wall" characteristics with instantaneous attenuation and a very steep slope. This results in unwanted ringing-type effects and should be avoided. In practice, our system should use an oversampling (see below) A/D converter with a mild low-pass filter, high initial sampling frequency, and decimation processing to prevent output sampling frequency.

Dither is a small amount of noise added to the audio signal before sampling. This causes the audio signal to shift with respect to quantization levels. Quantization error is thus decorelated from the signal and the effects of the quantization error become negligible. Dither does not prevent the quantization error; instead, it allows the system to encode amplitudes smaller than the least significant bit.

Oversampling is another technique aimed at improving the results of the digitization process. As noted above, a brick-wall filter may produce unwanted acoustic effects. In oversampling A/D conversion, the input signal is first passed through a mild low-pass filter, which provides sufficient attenuation at high frequencies. To extend the Nyquist frequency, the signal is then sampled at a high frequency and quantized. Afterwards, a digital low-pass filter is used to reduce the sampling frequency and prevent aliasing when the output of the digital filter (e.g. an interpolating, phase linear "FIR" filters) downsampled to achieve the desired output sampling frequency (e.g., 44,100 Hz). In addition to eliminating unwanted effects of a brick-wall analog filter, oversampling helps achieve increased resolution by extending the spectrum of the quantization error far beyond the audio base-band, rendering the in-band noise relatively insignificant.