James Johnston

Forum Replies Created

  • In reply to: Lossless streaming?

    March 27, 2023 at 2:31 am #5581
    James Johnston
    Participant

      Well, you know, it’s interesting. Back when I started working on coding, it was voice coding, and 16 kb/s was “way too much” for some applications. Things changed, transmission methods improved, and now we needed 9.6kb/s for speech.  Then 7.2 then 4.8 kb/s, even as the transmission rate improved and improved more.

      At the same time, the argument was “coding will cease being necessary”.  I kind of think that myself, but it’s been since 1976 and CODING IS STILL NECESSARY and some of the rates are LOWER than they were then. The price of transmission has dropped by so much (2000 for 300 bits/second modem, plus long distance charges) to what?  What does it even cost to send 300 bits/second? Is there anything within 5 orders of magnitude even still in existence?

      The same as memory. First processor I built required 32 “1×1024” memory chips.   They were blisteringly fast, 190 nanoseconds. Almost 5MHz.  Yeah. And they were, for the time, blistering fast.

      Now? Could anyone provide a 1k memory chip? 🙂

      But coded speech and music seem to be expanding. But I do wish it was all lossless coding now.

      In reply to: Studio Calibration

      March 24, 2023 at 5:21 pm #5572
      James Johnston
      Participant

        Capture of a room response at one point involves 4 variables. This might be the only use I would personally wish to put a basic ambisonic microphone to.   You have 4 variables, dx dy dz and pressure.

        The ear detects pressure via the eardrum, but volume velocity interacts with the head at frequencies where the head is wider than maybe 1/8 wave or so, so it can create pressure in a spot where there would otherwise be no pressure component.

         

        This is part of what Bob O has pointed out, but there’s more in the issue with high frequencies in rooms with microphones, speakers, etc. Typically speakers do not radiate the same pattern at all frequencies. This affects the timbre of the room reflections substantially. If you measure frequency response with a long window at high frequencies, the high frequencies are going to be “off” usually in the “too little” direction, even if the direct sound is flat.   So,  measurement must be made at a variety of frequencies.   This can lead to several mistakes, including turning up the treble until your earlobes start to bleed, but also causing ‘dark’ masters because the studio was too “hot”.  Even if your speakers are both direct flat and power-wise flat the room response can fool you here, and most speakers “aren’t even close” to being both power flat

        There are a lot of interactions to cope with, and it’s easy to oversimplify.

        In reply to: Lossless streaming?

        March 24, 2023 at 5:06 pm #5571
        James Johnston
        Participant

          Bob K, I have to disagree.  The things “lost” in an AAC stream (or MP3, I won’t address other methods) are not too likely to mitigate noise in an original, if anything, it’s more likely it will push the noise floor UP and make it more audible.  Remember the “throws stuff out” is actually adding noise to the original.

          I have heard a misbegotten codec remove some background noise, but let me tell you, the piano fade into that missing noise was “quaint”, and that’s a euphemism.

           

          This has to be the original provided to the FLAC (or other lossless) codec. There are many, they all have very close to exactly the same rate, and that’s because the mathematics is pretty obvious to signal processing types, and the entropy can be estimated to start with.

          In reply to: Lossless streaming?

          March 24, 2023 at 5:01 pm #5570
          James Johnston
          Participant

            Semi-lossless makes some sense if you’ve got a rate maximum, it’s better than “breaking down”, and should be able to strictly reduce the bit depth of the signal when the rate max is hit.  Of course lossless is better, unless you simply can’t get it from here to there.

             

            James Johnston
            Participant

              At 24 bits I would not expect much problem.

              At 16 bits, with a low-level signal turned WAY WAY UP, you might hear the dither a bit. But that’s just like analog except even quieter.

               

              Noise shaping is really secondary here, unless it’s a really dumb version of noise shaping.

              In reply to: Lossless streaming?

              March 19, 2023 at 10:05 pm #5526
              James Johnston
              Participant

                FLAC uses, last I saw, an adaptive LPC system, which is not at all a bad choice, with the error signal integerized then compressed via information-theoretic approach to make it lossless.

                The trick in FLAC (and any other lossless encoder) is to make the code give invariant results across a wide variety of platforms.

                I think FLAC uses Arithmetic coding, but I could be wrong, it maybe LZW or a derivative. I knew the answer to this once.

                As an aside, it is not hard to determine the entropy of a given signal when it is being predicted in given length blocks. The Spectral Flatness Measure provides this information. It is ‘1’ for a flat (ergo unpredictable, i.e. full entropy) signal, and is always in practice considerably smaller than ‘1’.

                The SFM is calculated by taking the ratio of the geometric mean and arithmetic mean of the power spectrum.  The number you get is for absolutely PERFECT prediction, matched perfectly to that block of signal. Needless to say, that’s a bound you’ll never reach.

                In reply to: Lossless streaming?

                March 18, 2023 at 8:32 pm #5523
                James Johnston
                Participant

                  FLAC should always sound better than AAC, bearing in mind I’m the guy who invented the biggest part of AAC.

                  AAC is better than any of the other codecs out there, as far as I know, still IF YOU HAVE A GOOD ENCODER.

                  But, if you can get lossless, get lossless.  Geeeze, louise.  I’ve been saying this about codecs (many of which I invented) since 1989 if not before that.

                  If the lossless sounds worse, something’s wrong with the input. EOF.

                  James Johnston
                  Participant

                    There’s a number of issues here.  The results may be different for standard full-wave rectifiers vs. switching supplies, for instance, but in both cases, more storage is almost always a win, the question is “how much more matters”.

                    For standard rectifier supplies, you can get very annoying 120 cycle-shaped limiting with a weak power supply.

                    For switchers, the same problem can occur, but not at the “power supply output” rather at the first stage after line rectification happens, if there isn’t enough capacitance to hold up the regulator control range.

                    So where the capacitor needs to go, and what size and voltage it requires, can vary.

                    However, adding on millifarad of the right voltage across each pole of the supply to center should never hurt (for low voltage supplies, that is!).

                    My thought, though, if you’re going to do this (old analog research hardware designer in me is coming out here) go large. Capacitors are cheap.

                    In reply to: Clipped signals and the D/A stage

                    March 6, 2023 at 5:19 pm #5108
                    James Johnston
                    Participant

                      As to transients, they have widespread spectrum, and the error spectrum aliases back down, but the overall spectrum is wide vs. wide, and you don’t necessarily get any tonal components. This is also shown in Bob’s book.

                      In reply to: Clipped signals and the D/A stage

                      March 6, 2023 at 5:18 pm #5107
                      James Johnston
                      Participant

                        Oh boy. You asked an interesting question. Ripples due to bandwith limiting come along with the digital realm. If I take a square wave, and filter it to a bandwidth where any of the appreciable harmonics are removed (or changed in phase) you will see “ripples”. These are a direct consequence of modifying the spectrum (by removing or changing phase) of the signal in question.  This is, however, not the issue with digital clipping of periodic signals.

                        What’s more, “Gibbs Ears” is a term that refers to two things, one of which is a theoretical issue that can only happen with theoretically perfect square waves (which do not exist in the real world!), and the other of which, which looks somewhat similar, which is the result of bandwidth limiting, but which is NOT the same. “Gibbs Ears”, the zero power amplitude excursion in a theoretic Fourier transform, has a finite amplitude and zero width, yes, ZERO width, and thence has zero energy.  This does not happen in the real world, because one must have infinite bandwidth of the square wave, which literally can never exist int he real world.  That’s “Gibbs Ears”.

                         

                        The effect of bandwidth limiting (the ripples that have finite width as well as finite amplitude) are simple results of Fourier mathematics, and are not an ‘error’, they are what you SHOULD see when a wide-band signal is filtered to narrower bandwidth.

                        NEITHER of these is clipping.

                        The issue with clipping (a periodic signal is worst in this case) in the digital realm is that you will generate harmonics of the periodic signal that are OVER Fs/2 (Fs is sampling rate) and as such will promptly (instantly, no other option applies) alias back down into the baseband.

                        For instance, let’s propose (for a really ugly example) a sine wave (using 44.1kHz sampling) that is set to (44100-1000)/3.  Yes, that means that the third harmonic of that sine wave, when you clip that waveform symmetrically, is 1000Hz. That is both anharmonic, extremely audible, and, well, a lot of other things, mostly “bad”.  If you clip assymmetrically, you also get 15.xxx kHz tone, too. Note, also NOT harmonic.  And yes, this continues up the harmonic number, splattering <redacted> all over your audio spectrum.

                        This is why digital clipping is bad.

                        There is a short graph of this in Bob’s book somewhere, wherein he says (at my urging, not that he had to be urged much) “don’t do that!”

                        In reply to: Mixing for different formats, how to go about it?

                        March 4, 2023 at 1:57 am #5084
                        James Johnston
                        Participant

                          1116 or MUSHRA are both flawed in some sense, because of the difficulty of comparison. Yes, one can use “binaural head” and such, if there ever WAS a “real soundfield”.

                          In the modern day, that’s a very limited thing, and often not what one wants in an immersive “larger than life” setting.

                          You can capture speaker rendering in binaural and compare it to virtualized, but that’s only comparing the virtualizer.

                          The problem is quite difficult.

                          In reply to: Mixing for different formats, how to go about it?

                          March 4, 2023 at 1:54 am #5083
                          James Johnston
                          Participant

                            Indeed. Perceptual evaluation is ***ALL*** in this regard.

                            I’ve heard arguments about ‘accuracy’ and ‘measurement’ but there are only two rules I think are important.

                            The first is “never, EVER, break the illusion”
                            The second is “don’t require separate mixing and production for each format”.

                            I suppose a third rule of the inquisition two is “You must be able to mix house mikes, spot mikes, directs, etc, all convincingly” but that does actually come into Rule #1.

                             

                            As to “description” that is totally inside each and every engine. R theta phi (distance angle elevation) and XYX (position in relative space) both work.  When you’re doing anything involving motion (either listener head motion or source motion) smoothing and absolute lack of palpable filtering and/or glitching are required.  But that goes right back again to rule 1, which is DO NOT BREAK THE ILLUSION.

                            In reply to: This is impossible!

                            March 4, 2023 at 1:44 am #5082
                            James Johnston
                            Participant

                              Indeed, and 32 fixed point, let’s consider that, please. (this really is acoustics but we’ll do it here).

                              From noise floor to peak level is 6.02*32= 192.64 dB dynamic range.

                              The noise floor of the atmosphere is somewhere in the 6dB SPL to 8.5dB SPL range at your ear drum. You won’t be rid of that until you have no atmosphere on both sides of the ear drum, which presents a few issues.

                               

                              So we’ll take the 6dB limit.  That means your peak level is 198.64 dB SPL.

                              194dB SPL is a waveform that goes between 0 pressure and 2 atmospheres, yes, from perfect vacuum to 2 atmospheres.  So, right away, there’s not going to be any kind of linearity involved.  But let’s assume that it’s only positive peaks for a minute, forget that trivial little detail.

                              198.64-194=4.64 dB above 1 atmosphere.  Divide by 20, exponentiate, that’s 1.7 atmospheres ABOVE 1 atmosphere, or about 25 PSI.  Now, 25PSI overpressure is rather a lot, it tends to be used only in “military” applications, to say the least.   It’s not lethal, but “serious hearing damage” is quite possible, also buildings may collapse, and the like.  In reality, it will take out windows, flatten weaker buildings, etc.

                               

                              So, yeah, 32 bit fixed point is a touch of overkill for capture.

                              But, how do we get 198.64 dB of dynamic range in an electrical circuit given the charge on an electron?  Well, shot noise alone is 1/sqrt(number_of_electrons ) in the circuit. So you need about 2^64 electrons per second, which is thereabouts of 1.8 E19, or roughly speaking a peak current of a bit over +- an amp into the mike preamp, for a dynamic mike, and similar preposterous things for other kinds of mike.

                              So, 32 bits for capture is ridiculous.

                              For computation 32 bit fixed OR float may not be enough. Doing those 10Hz 3rd order butterworth highpass filters at 96kHz is “interesting” indeed, as RBJ and I have both pointed out a few times.

                              In reply to: Mixing for different formats, how to go about it?

                              February 22, 2023 at 3:20 pm #4980
                              James Johnston
                              Participant

                                I must say I prefer to use standards that avoid unnecessary assumptions or force a perceptually less-than-optimum approach on users.

                                 

                                In a word where processors can be programmed in (literal) milliseconds, the standard needs to be “load your algorithm here”.

                                 

                                In reply to: Mastering Without A Limiter?

                                February 16, 2023 at 10:25 pm #4933
                                James Johnston
                                Participant

                                  You’re also evaluating for intersample overs? If so, you should be safe, except in the most unusual situations. To be almost 100% absolutely safe you need your highest peak (including intersample) to be approximately at -4dBFS.  Such situations have to be very nearly contrived, however.