Okay, I’m gonna try to weigh in on this a little.
1. I don’t know of a single codec or ADC/DAC product that accepts data in floating-point format, but it’s just a matter of putting the logic in to do that. I wouldn’t waste any real estate on the chip doing that.
2. Every hardware project that I worked on had some off-the-shelf codec that moved data via 3-wire serial interface (data, bit clock, word or frame clock). That data would be DMA’d into or outa the DSP and that data was always fixed-point, essentially it was twos-complement integers representing the sample values. In the olden days, ADCs and DACs had their format be offset bias meaning 0x0000 was the most negative, 0x8000 was zero, and 0xFFFF was the most positive value. But, fortunately that has changed.
3. The issue about this is about meaningful bits and of quantization noise (which is the roundoff error due to rounding) as well as errors from non-linearities. This whole issue has become a horse of a totally different color when sigma-delta (ΣΔ or 1-bit or multi-bit or MASH) have come out. With a fixed-point codec, this quantization noise level was the same for audio of small (non-zero) amplitude vs. large amplitude as long as clipping didn’t occur. That means that the “N” in S/N ratio is constant, and we get a better S/N ratio with louder signals than with quieter signals.
But we also need some headroom and must not clip (usually), so the tradeoff is between headroom and the S/N ratio. In fact, I believe the best, simplest, and most concise and useful definition of “Dynamic Range” in dB is the sum of dB of S/N ratio and dB of headroom. That’s where the tradeoff is directly. Now there is a bunch of specsmanship going on with these ADCs and DACs, so I would define the number of meaningful bits in the word of data coming from or going out to the codec to be this Dynamic Range in dB divided by 6.02 dB/bit. An honest 24-bit converter would have 144 dB dynamic range. If the dynamic range is, say, 120 dB (and that’s a pretty damn good codec), then the most-significant 20 bits are meaningful and the 4 bits on the right of a 24-bit word will be noisy. I am not saying to just throw those 4 bits away, if the codec designers did their job and if they were listening to me bitch ca. 1995, they were giving us those noisy bits as a sorta initial dither rather than hacking them off inside the ADC chip. We want those noisy bits – don’t truncate them.
4. Now, the only real purpose of floating-point is so that the audio DSP and recording guys can say “fuck you” to the concern of headroom. In our internal processing or storage of audio samples, we have more headroom than we’ll ever need. We only have to worry about headroom when the data is going back out to the DAC (or to AES/EBU or S/PDIF) in some fixed-point format. Then we gotta worry about headroom or the hard clipping will say “fuck you” to us (or our listeners).
I still have a place in my heart for fixed-point processing (like in the Mot DSP56K), but it really is just easier to do nearly all of the audio DSP in floating point as long as the hardware supports floating point. The other important thing that floating point affords us is to give the same S/N ratio for quiet signals as we get for loud signals. So, we sometimes don’t need to worry about scaling signals like we do in a fixed-point environment.
5. Now the next thing to think about is the actual DAC/ADC technology that actually converts these numbers to or from a physical voltage. In that conversion, there is a quantization error (the actual sample value isn’t representing exactly the corresponding voltage and the difference is the quantization error). If, somehow, they could make a codec where the S/N stays constant for quiet signals vs. loud signals, then floating-point might make some sense. But for conventional ADC/DAC technology (“conventional” is what comes before ΣΔ), the N remains constant for quiet or loud signals (assuming no clipping).
There are goofy things we used to do before ΣΔ to try to give us consistent, roughly constant S/N for loud vs. quiet signals. One is simple companding, what the old telephones did with what was called μ-law in the US and A-law in Europe. There is a non-linear curve that looks like a bipolar logarithm going between the input analog signal and the input to the ADC. But, we have to undo that curve exactly in the DSP before we can do stuff like filtering, and if our inverse curve (which is usually a lookup table) does not exactly match the analog curve, we introduce more error. Matching that is a problem.
The other goofy way to do this would be with adaptive ADC or DAC conversion in which the scaled sample (that has a roughly constant headroom and S/N) goes to a DAC and there is some kinda digital-controlled amplifier following the DAC. It’s like the amplifier (or the DAC Vref) would change gain by a factor of 6.02 dB each time the word (or the exponent in the floating-point format) would get shifted over by one bit.
6. Now ΣΔ codecs are a horse of a different color. The way that quantization noise happens in them motherfuckers is a bit convoluted. If, somehow, they could design a ΣΔ DAC that had quantization noise magnitude that is roughly proportional to the signal amplitude (so a roughly constant S/N), it might make sense for the DAC to receive a floating-point word and for the ΣΔ internal DSP math to be done with floating point. But I dunno how they might do that.
So, Bob, I think you’re right. Maybe not in principle, but just in reality.