Bob O said,
“Obviously, it all ought to be 64 bit processing that has been properly dithered.”
I just want to be clear about something. At least what I do (not necessarily speaking for all DSP coders).
I will routinely use 64-bit double precision arithmetic now with the sample-processing, if I believe that the numerical type is native to the machine (like a 64-bit CPU such as Intel Xeon or AMD Opteron). Otherwise I’ll use 32-bit float (older CPUs and some current ARMs and the SHArC DSP). When using floating point I never bother with dither because, especially with 64 bit, I know that it’s not needed. There are 54 bits in the mantissa and even the crappy quantization done without dither is soooo far down there, essentially 200 dB below the noise floor of 16-bit CDs, that this crappy quantization noise will never make any friggin’ difference to the final output word (where we should add dither and maybe noise-shaping). With 32-bit floats (that have 25-bit mantissas), I might be tempted to make them do better, dithered, quantization, but it’s too much of a friggin’ pain in the ass for floating point. Much easier to do these bit manipulation operations with fixed-point output (the dither and quantization noise level is constant – the rounding point does not float around).
Ideally, the only efficacious place to bother to dither and noise shape is where the bit depth is getting reduced to be output to a DAC or a stream or a .wav or .flac file. If you’re mastering to a 16-bit red book CD, for goodness sake you must properly dither (and you should noise shape) at that point where the 16-bit fixed-point value of each sample is determined and written to the master. It would be inexcusable to neglect this.
But with 64-bit floating-point quantities, don’t bother. If your DAW or mix board or plugin or whatever tools are doing all of their sound processing in 64-bit float, just do it and don’t give dithering or noise shaping a thought. The only things you have to worry about are NaNs and Denorms possibly fucking you up.
Bob Chidlaw, who was before my time the Chief Scientist at Kurzweil Music Systems, came up with a wonderful floating-point format that has no NaNs, so every bit pattern in that 32-bit or 64-bit word is a number and you never have to worry about some computational hiccup spitting out a NaN. There are Denorms but they are built into format. It’s sorta like IEEE floats but the format is more like 2’s complement rather than sign-magnitude. The DEC PDP-10 used this format but didn’t have Denorms and Denorms are useful to have, if they don’t cause an interrupt (called an “exception”) to your CPU.
Anyway, I just wanna make sure that it’s clear that no one is dithering a 128-bit result of a multiplication back to 64 bits. No one is bothering with dithering at the 64-bit “bit depth” (that might be a misuse of that term, how is the term “bit depth” correctly applied to floating-point audio samples?). But when things become reduced to 16 or 24 bit fixed, you should certainly “properly” dither that value and probably should noise shape around that quantization.