Noise shaped dither – Does it cause issues with DSP-based volume controls?

    • March 21, 2023 at 7:06 am #5530
      Bob Katz

        Recently someone brought up this topic on social media and as usual it turned into a useless free-for-all over there. But over here at the Fora we can handle this question with grace, civility and maybe even reach a good conclusion. The question:

        Is noise-shaped dither ever a problem, anywhere?

        For example, someone brought up the concern about nearly ubiquitous DSP-based volume controls and equalizers, and how they might interact with material that was made with noise-shaped dither.

        My response is this is a non-problem-problem. If the DSP-volume control is properly dithered to 24 bits of the DSP is 32 bit float, or even 32 bit fixed if the DSP is 32 bit fixed, then the noise floor of the dither in the volume control is nominally -141 dBFS or even lower. While the noise power of 16 bit noise-shaped dither is in the range of anywhere from -91 through perhaps -71 dBFS. It’s not possible to compute the sum of two differently-shaped uncorrelated noise floors, but if they are shaped the same:

        when RMS sum -71 dBFS with -141 dBFS, what do you get? Answer: You get -71, to any reasonable decimal place. In other words, the noise floor of the digital volume control has no effect on the noise floor of the dithered material being auditioned. The psychoacoustic value of the 1644 noise floor has not been disturbed. These psychoacoustic dithers have been designed not to disturb the ear within a wide range of acoustic gains, so even if you turn up the monitor to play things (reasonably) loudly, there will be no issues. This is true whether you use an analog or a digital volume control.

        Now, what if the DSP volume control and equalizer is NOT dithered? If it’s a 32 bit undithered floating point volume control, the distortion products of truncation to 24 bits within the DAC are usually pretty mild, and in my opinion the distortion products would not add any more interference with a 1644 noise shaped wav than they would with any other source. It would still be truncation distortion, fuzzing up the presentation of any source, dithered or not.


      • March 21, 2023 at 7:33 am #5531
        Bob Katz

          Here is a comparison of the spectrum of a 1 kHz sine wave at -20 dBFS, processed with 16 bit noise shaped dither, as displayed in the top  image spectrum. The top image represents what happens if you play the noise shaped 1644 source through 24 bit TPDF dither as would be the case with a DSP volume control. As you can see, the 24 bit dither raises the noise floor a bit at low frequencies as the 1644 noise shaped dither is quieter than 24 bit dither at low frequencies. But other than a slight measured increase in noise, there seems to be no measured or audible effect on the material, as the added noise is well below that of any DAC.

        • March 21, 2023 at 10:01 am #5533
          Alexey Lukin

            I’ve done a simple test that shows that a noise-shaped record survives digital gain controls with less distortion. See the animation below.

          • March 21, 2023 at 12:49 pm #5537
            Bob Olhsson

              Is this true for any additional processing?

              • March 21, 2023 at 2:24 pm #5541
                Alexey Lukin

                  Is this true for any additional processing?

                  Yes, I think that it should be also true for other types of processing, as long as they don’t kill the HF band of noise. If the HF band of noise is removed (e.g., during the SRC), then the noise-shaped signal becomes underdithered and more prone to quantization distortion than the TPDF variant.

              • March 21, 2023 at 1:12 pm #5538
                Bob Katz

                  Bob Olhsson: Good question….  What Alex discovered (and which I’m working on demonstrating in a power point using his research) is that the original random noise in the original noise shaped dithered source helps to modulate and disguise the distortion due to truncation in the DSP processor. It’s not “self-dither” but it does help. So, in other words, noise shaped dither helps deal with defective (undithered) digital monitor controls. It’s not a solution, but it is a kind of bandaid.

                  Therefore, equalization, compression, crossovers, I think any kind of processing common in consumer A/V receivers, will be helped by noise shaped sources.

                  I’m sure Alex will elaborate here on this discovery and his thoughts on the matter.

                • March 21, 2023 at 1:44 pm #5540

                  Bob, I recall this being discussed at AES paper or workshop sessions in the Lipshitzian era.  The question was specifically, if your fixed-point DSP (in those days it was the Mot 56K) did no dithering internally but the input signal to the DSP processing had some kinda dither added, just to keep all those internal states sorta hopping around a little all the time.  Then, hopefully, the effects of quantization of those states will sorta get averaged out a little.

                  Now I dunno an actual study, either theoretical or experimental, that explores how well noising up the input does to rid us of the buzzing and other nasties of quantization later down the signal path.

                  Now, a DSP volume control is just a multiplication.  Multiply by 1 and that’s 0 dB.  Multiply by 1/4 and that’s -12 dB gain.  Or by 2 and it’s +6 dB.  Multiply by 0 and that’s -∞ dB.  Now, the result of the multiplication is that your 24-bit word has become a 47-bit word and you will eventually need to lose the bottom 23 bits so that the output word is back in the same form as the input.  Losing those bottom bits is quantization and , if the signal gets really quiet, that quantization will sound like buzzing.  After all, it takes your tiny sine wave and turns it into a tiny square wave.  Adding dither before that quantization adds a little noise, but it takes out the buzz completely.  So, in my opinion, the DSP coders doing that volume control are awful goddamn lazy to not dither the quantization that occurs after the scaling that the volume control does.

                  Now, *I’m* a little lazy, and I did not (and do not) dither every quantization I had in my DSP processes.  The simplest way to avoid the need to dither is to maintain a wide word (like 64-bit or 80-bit double precision) internally in your algorithm until you get to your final output word that is getting cast back into a 24-bit integer and sent out to AES/EBU or S/PDIF or to a DAC (or written to a .wav file).  That’s when you want to take the time and effort to properly dither the 64-bit float as you toss out bits to get your 24-bit fixed output.  (You will also have to check for overshoot and saturate the value to the rails at that point.)

                  • March 24, 2023 at 4:04 pm #5565
                    Bob Katz

                      Dear RBJ: What Alex’s research showed and what the Powerpoint that I’m producing illustrates is that ultra high noise shaping, with considerable high frequency noise in the source material, self-dithers the next quantization. Even if the next quantization truncates to 16 bits! Resolution down to about -107 dBFS within the truncated, self-dithered, 16-bit result. Which is pretty amazing. The shallower the shape of the noise-shaping of original dithered source, the worse the self-dithering, until if the source is 16-bit flat TPDF dithered, it has no effect at all on subsequent dsp and you end up with distortion and noise shaping. So this is a form of self-dithering.

                  • March 21, 2023 at 5:48 pm #5542
                    James Johnston

                      At 24 bits I would not expect much problem.

                      At 16 bits, with a low-level signal turned WAY WAY UP, you might hear the dither a bit. But that’s just like analog except even quieter.


                      Noise shaping is really secondary here, unless it’s a really dumb version of noise shaping.

                      • March 21, 2023 at 6:12 pm #5543
                        Bob Katz

                          Replying to both JJ and RBJ:

                          In addition to Lipschitz, no less of an authority than Robert Stuart made it clear years ago that noise shaped dither is not a problem when a noise shaped source is played through any digital monitor control or processor.

                          The amazing thing that Alex has proved is that noise shaped dither actually is BETTER for use if the monitor control is not dithered and truncated to 16 bits. Such a situation may not even exist anywhere in current products, but Alex has shown that  what is likely the worst case scenario is a pretty decent scenario. Look at the spectrogram of -17 dB attenuation to see that the noise shaped dithered source is largely intact, which is quite miraculous for a non-dithered 16-bit DSP at 17 dB of attenuation!

                          Yes, for 24-bit volume controls and similar DSP, whether the source is 16-bit 44kHz made with noise shaping dither or TPDF is academic, and dithering the volume control is highly recommended but not the world’s worst tragedy. But what Alex has proved to my satisfaction is that not only is noise shaped 16-bit 44kHz dither acceptable and harmless for 24-bit monitor controls, it actually is a BETTER choice when you’re dealing with something as crazy as a 16-bit non-dithered monitor control. This refutes any claims by certain wayward mastering engineers who think that noise shaped dither is the devil incarnate.

                          In the near future, Alex and I will have a Powerpoint that demonstrates the measurements and discoveries that Alex has made.

                          • March 22, 2023 at 4:07 pm #5552

                            Okay, just a couple of dumb things so that I can get a better idea of what is going on.

                            1. I edited your last comment just a little, Bob.  Just a little spelling and usage.  Undo it if you want.  Yer the boss.

                            2. I presume “Alex” is Alexey.  But I just wanna make sure.

                            3. Can you be a little more descriptive of what MBIT+ is?  How it is defined?

                            4. Now the original tone is at -80 dB fullscale, right?  Does it stay at -80 dB?  What is the dB gain indicated in the upper left corner?  Does it reflect the signal level?  To the dither level?  Or the quantization step size?  What exactly is changing by “+1 dB” or “+4 dB” or “-4 dB” or “-10 dB”?  (Oh, duh, it’s the gain control, so that does affect the level of dither.)

                            5.  I’m thinking it might be the dither level that is changing because I cannot understand why the screen goes black if the dither level is constant.  If that is the case, exactly what is mean by “0 dB”?  I think I know what it means for TPDF, but I have no idea what it means for MBIT+.

                            6.  Now noise spectrum is *not* the same as the probability density function (PDF).  You can have various PDFs attached to various noise power spectrums.  Noise shaping is about the power spectrum.  But decoupling the “moments” of the quantization error signal is about the PDF.

                            Rectangular PDF that’s exactly as big as the quantization step size will decouple the first moment (that is the mean) or DC of the quantization error from the input signal.  This means, if the input is a very slowly changing DC ramp, that the mean (the first moment) or DC of the error from rounding will always be zero, no matter if you are mid-tread or just on the edge of the quantizer step.

                            But that does not decouple the variance (the second moment) or AC power of the error signal from the input level.  If the input is a slowly changing DC ramp, when the ramp is mid-tread, rectangular PDF dither will be rounding up and down equally likely and the AC power of the quantization error signal will be maximum.  When the ramp is right on the step, it will always round to that step value and the AC power is zero.  So there is noise modulation based on the pre-quantized signal value.

                            But triangular PDF (TPDF) dither that is as wide as two quantizer steps will completely decouple both the DC and AC power of the error from the input value.  And I don’t think anyone yet has shown that we can hear higher statistical moments than the first and second (and since we can’t hear DC, we can really only hear the second moment).  With a slowly changing ramp, you can only hear a constant quiet level of dither noise, no matter what the signal DC level is.  No noise modulation.


                            Now “noise-shaped dither” can mean two different things.  It could mean coloring the spectrum of the dither before it’s added to the signal just before quantization.  It’s hard to control both the dither spectrum and dither PDF independently.  What comes out of a good uniform pseudo-random number generator (PRNG) is rectangular PDF and white spectrum (up to Nyquist).  Adding two independent rectangular PDF and white random numbers together gets you a triangular PDF and is also white.  This is the dither that Stanley generally talked about because it was easier to model and predict.

                            But if you added two dithers one generated now and the other from the previous sample, that is adding d[n]+d[n-1], what you get is TPDF that is sorta low-pass filtered.  Zero amplitude at Nyquist.

                            However if you subtracted the previous dither sample from the current dither, d[n]-d[n-1], what you get is TPDF that is high-pass filtered.  Zero amplitude at DC and maximum Nyquist.  That’s a little more useful perceptually and it’s still TPDF so all of these Lipshitzian properties remain: It’s the smallest amplitude dither that will completely decouple both the DC and AC amplitudes of the quantization error from the signal getting quantized.

                            I have used that kind of high-pass TPDF dither when I wasn’t doing noise shaping in the manner that Stanley normally meant by “noise shaping”.  This kind of noise shaping that is commonly done (like it’s what is done in ΣΔ converters) is where white TPDF dither is added to the pre-quantized signal in addition to an error feedback (from after the quantizer).  This is what is depicted in Fig. 2 here:

                            dither and noise shaping

                            This is what we normally mean by “noise shaping” and it doesn’t even have to be dithered.  With the Mot DSP56K and in other fixed-point contexts, I have done simple noise shaping without dithering.  The simple noise shaping was done with “fraction saving”.  Whatever M bits at the right of the N+M bit word that I lobbed off, I would save those bits in a state and in the next sample I would zero-extend those bits before adding them back into the signal just before quantizing.  It was pretty cheap, normally sorta self-dithered, and took care of a nasty limit-cycle problem I otherwise had with IIR filters having a pretty high Q.  Even when dead silence goes in (as zeros), the output of the filter would decay until it got to be about -70 dB and then it got stuck at -70 dB DC, even though it was dead silence going it.  Digital meters would get stuck.  This was the easiest way to fix it, but full-tilt dithering would have also fixed it.  But then you wouldn’t have dead silence coming out, just the dither.


                        • March 24, 2023 at 4:09 pm #5566
                          Bob Katz

                            Responding to JJ: Regarding the question whether noise shaped 16-bit original dither, might cause a problem with subsequent processing (e.g. digital volume control) I totally agree that it’s not a problem, and you would have to turn the gain up so far to MAYBE notice the original high frequency dither that is also not a problem.

                            Plus, as I tried to illustrate in my initial post, if the subsequent DSP is a monitor controller outputting a 24 bit word, then the original HF noise shaping is even less of a problem if the monitor controller is attenuating rather than boosting (highly likely). Even if it’s boosting, we assume that the listener is boosting because their analog monitor does not have enough acoustic gain to reproduce the source, so everything would still be proportional, and the original HF dither would still come out at about the same level for the same resultant SPL. We assume a normal listener listening at a normal SPL.

                        • March 22, 2023 at 4:24 pm #5553
                          Bob Katz

                            Good questions, RBJ. Alex and Alexey are the same person. He answers to both. I was so excited by Alex’s discovery that I’m currently preparing an annotated and soon-to-be-narrated presentation as a movie that demonstrates and explains what Alex discovered. Maybe I’ll finish it by this weekend. It answers about 60% of your above questions. Alex will hopefully answer the rest of them here.

                          • March 22, 2023 at 4:26 pm #5554
                            Bob Katz

                              Well, noise shaping without dither is the way that Sony was doing “super bit mapping” many many years ago, and you don’t hear about that anymore. Because noise shaping without dithering is basically defective or at least very ineffective. It has guaranteed noise modulation, almost by definition because as you say, with no signal there’s silence.

                              The whole idea of “auto black” is much ado about nothing (pardon the pun). I never use it. It’s not needed and only people with eyes and meters are concerned about dither noise, which you can’t hear at 16 through 24 bit levels anyway.

                            • March 22, 2023 at 4:56 pm #5556
                              Alexey Lukin

                                RBJ, MBIT+ is a dithered noise shaping, like in your Fig. 2.

                                The original tones peaked at -80 dBFS and did not change in level throughout the file.

                                The decibels on Bob’s slides indicate the additional gain applied prior to 16-bit requantization.

                              • March 23, 2023 at 12:16 am #5558
                                Phil Koenig

                                  My (possibly uninformed) take on this topic:

                                  I can’t imagine a volume control, digital or analog, that would affect dither or its effect on what is heard.

                                  You turn it up.  For digital, each sample is multiplied by a constant (giving a constant dB of gain or loss).

                                  If the gain is negative, the dither level is reduced, possibly resulting (in the worst case) the dither bits being truncated before they get to the D/A.  The situation causes note or reverb  fadeouts to be undithered.  But the fadeout is also quieter, so the quantization noise noise dither is meant to cover is also quieter by the same dB amount.

                                  If the gain is positive, the dither bits are boosted by the same amount as the program material, leaving the dither in place, albeit possibly no longer in the least significant couple of bits.  But since the program material is also louder, all that has happened its that everything got louder, both dither and program material.

                                  Feel free to correct me if I got lost in the woods as regards my logic; I’m always open to learning something new.

                                  • March 24, 2023 at 4:31 pm #5567
                                    Bob Katz

                                      Dear Phil: Let me confirm your assumptions, trying to put some numbers on it.

                                      You wrote:

                                      “I can’t imagine a volume control, digital or analog, that would affect dither or its effect on what is heard.”

                                      To elaborate: Let’s say the listener wants to turn up his monitor control. If the original was noise shaped 16 bit dithered, with very strong HF dither at an extreme of say -71 dBFS around 20 kHz, and audio at an integrated loudness of -20 LUFS, an extreme case… maybe representing a piece of very dynamic classical music. While dBFS and LUFS are different measures, for purposes of discussion, let’s assume they are close, which they are likely, within a few dB. These are nominal calculations, to get us in the ballpark.

                                      Now, regardless of whether they have an analog or a digital monitor control, let’s say they want to adjust their monitor to reach, say, an SPL of nominally 80 dB average SPL — for that integrated loudness. OK, using simple arithmetic,  then the 20 kHz dither noise, would end up, if  -20 LUFS == 80 dB, so nominally, -71 dBFS would be +29 dB SPL at, say 20 kHz. I doubt that would be audible, it’s way too low for my ears!

                                      OK, now are you worried about cumulative dither? Let’s squelch that question: If it’s a 24-bit dithered digital monitor controller, its dither would be at -141 dBFS. So, loosely and nominally, if -20 LUFS is 80 dB SPL, then the dither noise of the monitor controller would come to -121 dB SPL! That’s MINUS -121 DB SPL! Even if you add the noise of the DAC, the noise floor of the digital monitor controller would still be in the negative SPL!!!!

                                      If it’s a pure analog monitor controller, let’s say its noise floor is -80 dBu, so if 0 dBu == 80 dB SPL, then its output noise floor is 0 dB SPL or thereabouts, still quite inaudible. No matter how  you compute or fudge the meaning of “nominally“.

                                      So, I agree with you, Phil, looking at the numbers everything comes up roses.

                                  • March 23, 2023 at 4:17 pm #5559

                                    I think, Phil, you hit it pretty square.

                                    Nonetheless, I still think that on the channel strip, all of this volume and EQ and, perhaps, limiting or compression or added internal effects (if it’s effects send, you have to boil it down to a stream and send it out and that requires quantization) can be done with a very wide word for each sample.  (I forgot, you guys call it “bit depth”, so when an EE says “word width”, the audio guys say “bit depth”.)  In the olden daze I would say 32-bit float, but now I see no reason a digital board or a DAW or some other device manufactured in the 21st century can’t be doing all of the internal processing with really wide words, like 64-bit doubles.  Do your gain control, your EQ, your compression/limiting/gate, your added reverb, pitch correction, whatever, even the buss, all of that should be done with 64-bit doubles until you have to export it out to AES/EBU or S/PDIF or a D/A or write it to a sound file that’s fixed-point format.  Or a codec like MP3 or AAC or FLAC or whatever.

                                    Then and only then will quantization be necessary.  Then, at the quantization point, is the correct place to add TPDF dither and to employ noise shaping around the quantization operation.

                                    Even if it’s just a volume control, it’s sorta kinda inexcusable to not dither it after the gain scaling, and if everyone in the signal chain were keeping their noses clean, then we shouldn’t have to predither the input for any reason.

                                    But some gear is old.  And not everybody keeps their nose clean.  Some gear, some DAWs, some plugins fall short of appropriately dithering their final quantization operation.  Then, if the audio it’s working on already has a little dither to make it dance around the quantization level a little, fine.  Let’s do it.  But it’s not really an appropriate substitution to proper quantization in the first place.

                                  • March 23, 2023 at 5:01 pm #5560
                                    Bob Olhsson

                                      My concern is the effect of noise shaped dither after it hits undithered signal processing and truncation in streaming applications. Obviously, it all ought to be 64 bit processing that has been properly dithered. My understanding is that in the real world, it almost never is. My clients’ careers depend on the ordinary listener having the most engaging experience possible.

                                      • March 24, 2023 at 2:02 pm #5564
                                        Phil Koenig

                                          Thanks for your reply. Very insightful.

                                        • March 26, 2023 at 3:25 pm #5580

                                          Bob O said,

                                          “Obviously, it all ought to be 64 bit processing that has been properly dithered.”

                                          I just want to be clear about something.  At least what I do (not necessarily speaking for all DSP coders).

                                          I will routinely use 64-bit double precision arithmetic now with the sample-processing, if I believe that the numerical type is native to the machine (like a 64-bit CPU such as Intel Xeon or AMD Opteron).  Otherwise I’ll use 32-bit float (older CPUs and some current ARMs and the SHArC DSP).  When using floating point I never bother with dither because, especially with 64 bit, I know that it’s not needed.  There are 54 bits in the mantissa and even the crappy quantization done without dither is soooo far down there, essentially 200 dB below the noise floor of 16-bit CDs, that this crappy quantization noise will never make any friggin’ difference to the final output word (where we should add dither and maybe noise-shaping).  With 32-bit floats (that have 25-bit mantissas), I might be tempted to make them do better, dithered, quantization, but it’s too much of a friggin’ pain in the ass for floating point.  Much easier to do these bit manipulation operations with fixed-point output (the dither and quantization noise level is constant –  the rounding point does not float around).

                                          Ideally, the only efficacious place to bother to dither and noise shape is where the bit depth is getting reduced to be output to a DAC or a stream or a .wav or .flac file.  If you’re mastering to a 16-bit red book CD, for goodness sake you must properly dither (and you should noise shape) at that point where the 16-bit fixed-point value of each sample is determined and written to the master.  It would be inexcusable to neglect this.

                                          But with 64-bit floating-point quantities, don’t bother.  If your DAW or mix board or plugin or whatever tools are doing all of their sound processing in 64-bit float, just do it and don’t give dithering or noise shaping a thought.  The only things you have to worry about are NaNs and Denorms possibly fucking you up.

                                          Bob Chidlaw, who was before my time the Chief Scientist at Kurzweil Music Systems, came up with a wonderful floating-point format that has no NaNs, so every bit pattern in that 32-bit or 64-bit word is a number and you never have to worry about some computational hiccup spitting out a NaN.  There are Denorms but they are built into format.  It’s sorta like IEEE floats but the format is more like 2’s complement rather than sign-magnitude.  The DEC PDP-10 used this format but didn’t have Denorms and Denorms are useful to have, if they don’t cause an interrupt (called an “exception”) to your CPU.

                                          Anyway, I just wanna make sure that it’s clear that no one is dithering a 128-bit result of a multiplication back to 64 bits.  No one is bothering with dithering at the 64-bit “bit depth” (that might be a misuse of that term, how is the term “bit depth” correctly applied to floating-point audio samples?).  But when things become reduced to 16 or 24 bit fixed, you should certainly “properly” dither that value and probably should noise shape around that quantization.

                                      • March 23, 2023 at 6:34 pm #5561
                                        Bob Katz

                                          Dear Bob O: First off, all the streamers are now accepting the high res (e.g. 2496) master. So it’s the one and only master that has to be shipped out. Apple will use the 2496 (converted from wav to ALAC for streaming) and stream the ALAC in their lossless service. Apple will downsample the 2496 to 3248 for their lossy service and convert from 3248 to AAC. So far, no dithering was involved nor necessary! Nor was noise shaped dither since we sent the high res to Apple!

                                          Spotify will take the 2496 and downsample it to 3248, then codec it to their lossy format. So again, only floating point was involved, no additional dithering, just encoding.

                                          On the other hand, if your client uses CD Baby, they will force you to send them 1644 for all the regular services. CD Baby is the devil there, and we mastering engineers are telling our clients to use another service until CD Baby gets their high res act together. Regardless, let’s see what happens with CD Baby. If the 1644 master was produced with noise shaping. In the case of Apple, it will go into their lossy stream and be encoded DIRECTLY to AAC from the 1644 master. The noise shaping will have no adverse effect on Apples 256 kbps codec. Same with Spotify.

                                          So I think you’re worried about a non-problem problem.

                                        • March 29, 2023 at 10:00 am #5605
                                          Bob Katz

                                            Dear All: If you are unconvinced that noise shaped dither is harmless to succeeding processes, Alex Lukin and Bob Katz have produced a YouTube Video revealing Alex’s recent research on the subject. I think it will turn a few heads:

                                          • March 29, 2023 at 4:44 pm #5608

                                            Okay.  First question, how loud is the MBIT+ dither?

                                            Quantization noise using standard TPDF dither is 4.7712 dB louder than quantization noise without any dither.  (But it’s better noise.)

                                            Is MBIT+ quantization and noise louder yet?

                                            What is the PDF used?  Triangular? Gaussian?

                                            Finally, care to divulge what the noise-shaping transfer function is?  We can see you’re dumping a lotta energy up there in the top octave just below Nyquist.

                                            • March 29, 2023 at 5:09 pm #5611
                                              Bob Katz

                                                RBJ I can answer some of it: The noise power from 20-20k of noise shaped dither can be significantly higher than that of flat TPDF. It’s just the way you compute the area under the curve. There are several different shapes  Megabit+ is It’s definitely triangular probability random.

                                                But the term “LOUDER” has an ambiguous meaning. The noise power does not translate to loudness, EVER. Perceived Loudness is a complex combination of frequency distribution, sound pressure level, masking, unmasking, etc. So using “loudness” in your question would require significant parsing. Suffice it to say that the high, mid and low frequency content of 16-bit noise shaped dither is set at a level which is perceptually well below where any normal human being would perceive it as noise, even with monitor gain above normal. Bob Stuart has some articles on the subject and I cannot quote the numbers but he could.

                                                You don’t hear the noise directly, you hear the result of different components of the noise masking and unmasking the ambience in the musical material.

                                                Alex Lukin can answer more directly the actual quantities involved, but they are based on being below all human minimum audible thresholds and such.

                                              • March 29, 2023 at 10:50 pm #5614
                                                Alexey Lukin

                                                  RBJ, MBIT+ is using TPDF for its dither. With its strongest “Ultra” noise shaping curve, it produces noise that is 28.6 dB higher than non-dithered quantization. Here is a pic of the filter:

                                                  • March 30, 2023 at 1:31 am #5616

                                                    motherfucker.  whatta curve!  What order of filter is that?

                                                    So this is TPDF of two quantization steps in width?

                                                    • March 30, 2023 at 7:21 am #5619
                                                      Alexey Lukin

                                                        What order of filter is that?


                                                        So this is TPDF of two quantization steps in width?

                                                        Yes, the standard one. We also have options to under-dither with TPDF of somewhat lower amplitude.

                                                        • March 30, 2023 at 1:59 pm #5621

                                                          Yes, the standard one.

                                                          that’s good.

                                                          We also have options to under-dither with TPDF of somewhat lower amplitude.

                                                          Oh, you shouldn’t do that.  If you reduce the amplitude of the TPDF from 2Δ width, then it’s like changing it to another PDF and you lose the statistical property of fully decoupling the mean and variance of the quantization error from the signal value.

                                                          But noise-shaping all of that up into the stratosphere, where I can’t hear it, is very smart.  And then when it’s 28 dB louder, it’s still there after the undithered gain reduction, even down by 10 or 15 dB.  Very smart.


                                                          You do have an ~4 dB boost at lower frequencies.  Like below about 500 Hz.  So, if I do my accounting right, white TPDF unshaped dither has the noise floor of 96.33-4.77=91.56 dB below full scale. (I am assuming the signal is the same uniform pdf.  If it’s a sine wave brought up to the rails, add another 1.76 dB, but I don’t do that.)  Now, your 4 dB of noise over the white dither puts the noise floor at about -88 or -87 dBFS, right?  So there would be a little rumble down there below 500 Hz, right?  (But it would be a friendly rumble with with its mean and variance completely decoupled from anything else.)

                                                          Am I correct so far?

                                                          What’s cool is that from 2 kHz up to the “noise wall”, you’re better than 10 dB below the standard white TPDF noise floor.  You gotta CD with 16-bit words and you have better than a -102 dBFS noise floor at the frequencies that, when we’re young, are the most sensitive (according to the Fletcher-Munson curve).

                                                          Last thing is, even if the feedback filter is FIR, it’s in a loop.  You had to be pretty creative making a filter of order 60 that’s in a loop (so the whole thang is an IIR and them mothers can go unstable especially at high order).  I would ask you more details, but I don’t wanna put you on the spot where you have to decline exposing any recipe to the secret sauce.


                                                        • March 30, 2023 at 2:28 pm #5622
                                                          Alexey Lukin

                                                            Yes, under-dithering does not fully decouple the noise variance, but works well enough when you’re looking to lower the noise at the expense of small noise modulation.

                                                            The boost at the lower frequencies helps MBIT+ to achieve more reduction in the midrange.

                                                            If I remember correctly, the filters used in the feedback loop have been designed as min-phase FIRs. I don’t think we’ve hit any problems with their stability, once they are properly normalized.

                                                  • March 30, 2023 at 4:02 pm #5625

                                                    So I gotta more general and “market”-based question.  In the olden days, there was this Apogee UV22 “Super CD Encoder” that mastering engineers were s’pose to use in the final step of quantizing down to 16-bit red book CD.  Now, are they still doing that?   I would presume that with modern DAWs, there is something built in or there is a standard plugin or maybe competing plugins to do that.

                                                    Will this MBIT+ become such a product?

                                                    Did the previous dither and quantize products just do white TPDF dithering or did some other products noise shape?  Seems to me that, post 1995, it would be inexcusable not to noise shape and then the issue is the tradeoffs.  Alexey and iZ seem quite happy to trade 4 dB below 500 Hz and whatever-the-fuck (28 dB) above circa 15 kHz to get superior performance between 500 Hz and 15 kHz.

                                                    • April 1, 2023 at 4:27 pm #5664
                                                      Bob Katz

                                                        I haven’t heard much from UV22 in recent years. POWR-1 is what we call “Near-Nyquist Dither” and it’s very similar in philosophy to UV-22. Psychoacoustically, to my ears, for most sources, it’s just a hair more resolved (more inner detail) than flat TPDF. It’s all based on masking, by the way. If you take the same piece of music, apply either of two properly-engineered dithers, one highly shaped and one without any shaping. Then you do a null between the two dithered pieces. The result should just be noise, the difference between the two noise floors. Yet, the two dithered pieces usually sound quite different. Why? Because the shape of the dither masks or unmasks ambience and low level information in different frequency ranges. There is absolutely no technical, psychoacoustic or even preference reason why flat TPDF should sound better (or worse) than shaped dither. The sonic results really depend on the frequency distribution within the music. When you find the right shape, the 1644 result should sound as close as you can make it to the 2444 original. In my case I try to compare the 1644 with the 2496 original made before SRC and before wordlength reduction. Yes, the choice is highly subjective, but the subjective criteria I use are: transient response, depth, soundstage width, and tonality.

                                                        If you find that noise shaped dither makes the sound brighter, this has nothing to do you hearing the high frequency boost  in the noise. Brighter dither noise does not necessarily translate to brighter sound. Sometimes shaped dither makes the sound warmer and fuller. It’s all because of masking between elements of the source and the noise shaped dither.

                                                        • April 1, 2023 at 8:53 pm #5669

                                                          There is absolutely no technical, psychoacoustic or even preference reason why flat TPDF should sound better (or worse) than shaped dither.

                                                          Bob, I agree with that.  Our ears are not flat and it makes perfect sense to steer the quantization noise from frequencies where we’ll notice to frequencies we’ll not notice.

                                                          And, again, PDF is not the same as spectrum.  Looking at that dither schematic I posted, inside the loop it’s white TPDF dither just before the quantizer.  But putting the loop around it then changes the frequency response from that white piece-wise quadratic pdf source (this is the dither + the actual quantization error, which is 4.77 dB more than just the quantization error by itself) to the output.  And there’s a kinda cool theorem by Michael Gerzon and Peter Craven that shows that that curve that Alexey posted must have some area below the flat white noise and some area above it.  That sorta says what the best we can do from an information theory perspective.

                                                          Brighter dither noise does not necessarily translate to brighter sound. Sometimes shaped dither makes the sound warmer and fuller. It’s all because of masking between elements of the source and the noise shaped dither.

                                                          In my opinion, I would expect well-shaped dither to just make it sound transparent.  The 1644 should sound like the 2444.  As much as possible.  And, just to make sure, I would expect the quantization to be done at the final mastering sample rate.  SRC should happen when the wordsize (or “bit depth” – which term should I use?) is larger.  If it’s 24-bit fixed point, then all that SRC math can be done with higher precision words and finally written with good-dithered quantization at the final sample rate.  But dithering the SRC output is not needed if the intermediate words were 64-bit floats.  Then quantize to 16-bit, all at 44.1, with this MBIT+ thingie or the product of your delight.

                                                          I dunno shit about what you guys do in the trenches (and my hearing ain’t so good, especially being an old fart who listed to too much loud vinyl rock-n-roll in the 70s and whose dad had hearing aids), but mathematically it can only make sense to do the SRC with larger words and get it to 44.1 kHz before quantizing to 16-bit samples.

                                                    • March 30, 2023 at 4:10 pm #5626
                                                      Bob Olhsson

                                                        DAWs have dither options, limiters often offer dithering options, sample rate converters have dither options and there are also dedicated dither plug-ins.

                                                    You must be logged in to reply to this topic.