Fixed versus Floating Point Notation

Robert Katz Leave a Comment

Dear Bob,

Would you be so kind to give me info (or links) about 32-bit audio format. I’m interesting in what’s the difference between 32-bit audio format and 32-bit floating point audio format, used in most audio processing software for PC. Why are the mix engines of soft audio sequencers like Cubase VST and Cakewalk based on 32-bit floating format? I’ve heard this format operates audio with level exceeding 0 dBFS. Is it possible? What is dynamic range of such system?

I’ve spent a lot of time finding answers in WWW, but still unsuccessful.

Thank you in advance,
Fyodorov Alexander, sound engineer, Russia

Dear Fyodorov:

I am not a mathematician, but I will explain in the simplest words what I know to be true. If you need a more mathematical explanation, you’ll have to crack a textbook!

Fixed point format is the language of the “outside world”. That’s because in the real world, full scale is full scale—it represents the highest analog value that can be encoded. 24 bit fixed point is the language of the outside world, and its encodable dynamic range is 144 dB. This is the highest resolution allowed in the current AES/EBU transmission standard.

Some designers choose to use floating point chips in their internal calculations. This is very popular with native applications like Cubase because the computer CPUs that they are using like to talk floating point. I hear that the Power PC chip can work in either fixed or floating point, but for some esoteric reason, designers like to use its floating point capabilities. Probably because you can take an existing library of floating point code, and compile it for the Power PC very easily if you stay in floating point.

Once a number has been encoded into floating point, yes, it is true that the numbers can now represent overflow (above full scale) without overload, as well as smaller values than the 24th (LSB) of a fixed point number. So you end up with more internal dynamic range than 24 bit fixed point. This allows easy calculations for the “mathematically impaired”. You don’t have to worry about overload when increasing gain, boosting a filter, summing channels, etc. Many authorities also claim that this improves the internal dynamic range of the calculations (particularly filtering and compression algorithms) inside the processor. My distortion measurements comparing some devices using floating point calculations against others doing fixed point show that with some kinds of filtering work, the floating point processors show less distortion. However, other designers working in fixed point produce just as low or lower distortion. Depends on the designer and his/her DSP talent.

For example, in the most expensive and advanced processors, modern designers using fixed point processing have progressed to internal calculations using “double precision” (48 bit in most cases), which doubles the internal dynamic range, and many authorities feel this performance produces better sonic results than 32 bit floating point. This is at the cost of cycles and power, but with more chips and more speed, it’s not a big cost deal at this time.

But the whole “race” changes again when the floating point designers start using 40 bit floating point. At that point, using equal types of algorithms, the two types of calculations likely produce equal sonic results, without quibbling. When working with double precision, it is very easy for a designer to design 24 dB (or more) internal headroom without losing meaningful dynamic range, so when working with double precision, fixed point becomes as powerful (some say more powerful) than floating point.

However, a certain really talented designer working “only” in 32-bit floating point produces excellent, low distortion results. My take on the matter is that designers concede that it takes a lot more effort (design talent) to prevent floating point work from giving you trouble than fixed point, perhaps because of rounding errors from calculation to calculation. But one programming mistake, or a few cost-saving shortcuts, can ruin ether fixed or floating point work, especially if shortcuts are taken at the most critical time, when the final output number is converted to fixed point 24 bit at the end. If those numbers are not converted (and properly dithered to 24 bits) at that time, then the sound of the entire system can be compromised.

Bottom line: Don’t be confused by the specs or the numbers or the claims. Distortion measurements may give us some clue as to why some systems sound better than others, but even distortion measurements don’t tell the whole thing, because they don’t always reveal all the shortcuts that a designer is taking under all circumstances. All other things being equal (and they never are!), it doesn’t matter whether you’re working in fixed or floating point. Only the results count. And there are real sonic differences between platforms. “Cheap digital” still costs—it does not sound as good as cheap analog.

Best wishes,

Bob Katz

Comment on this FAQ

Your email address will not be published.