Mixing for different formats, how to go about it?

    • February 6, 2023 at 5:48 pm #4749
      Robin Reumers
      Participant

        I’m interested to open the discussion on mixing for different formats. Sometimes labels will request different deliverables like Dolby Atmos, 360RA and I’m very curious to see how you go about that. Do you first mix in one format and then figure out a way to translate that mix to the other format(s) or do you start the mix over again for each format?

        Personally, my preferred way has been to mix them independently. For example, with Dolby Atmos, you’ll have X, Y, and Z coordinates but then in 360RA you have azimuth and elevation, and it’s very hard to translate from one into the other, and I just feel it’s faster to start over. Obviously, it takes more time, so curious if you have better solutions.

        • This topic was modified 1 year ago by Bob Katz.
      • February 7, 2023 at 8:37 am #4754
        Bob Katz
        Keymaster

          I’m a total novice at Immersive but, as fools rush in :-). If it’s a matter of positional translation, e.g. Atmos A to Atmos B, I would hope there are  tools for repositioning the final mix without having to remix. Anything from Flux that can do that?

        • February 16, 2023 at 10:23 pm #4932
          James Johnston
          Participant

            The right way to go about it is for the rendering part of your mix system to do the best possible job given the output format.

            For multichannel this isn’t that hard, really, but it does require some thought.  Since I don’t want to spam commercial stuff, I will not mention our stuff at the present.

          • February 18, 2023 at 7:29 pm #4959
            Bob Olhsson
            Moderator

              JJ’s “stuff” is pretty amazing!

              I haven’t had enough time to get as far as I’d like into immersive, but mixcubed is worth a very serious discussion. I’m amazed more people don’t seem to know about it.

              https://mixcubed.com/

            • February 22, 2023 at 5:21 am #4979
              Thomas Lund
              Participant

                One thing to keep in mind is to follow standards, in this case ITU-R BS.2051, so recordings may still be fully enjoyed many years from now. For instance, it would have been a shame had Bruce Swedien not recorded Oscar Peterson or Count Basie in perfect stereo back in 1959, but in some format that didn’t play as intended today.

                • February 25, 2023 at 11:13 am #4999
                  Robin Reumers
                  Participant

                    Hi Thomas, I think that’s a great point. I went through BS.2051-3. I honestly wish there was a more clear standard when it comes to immersive audio. There are quite a few different formats out there and they all work slightly differently.

                    When it comes down to it, you have 2 ways of describing immersive panning, either in a shoebox format (X, Y, Z coordinates) or as a sphere (azimuth, elevation). VBAP works quite well, but it would be great if there was a standard that could translate as accurately as possible between the formats, so you can decide on your intent, and it would automatically translate it into the different formats, without having to author them seperately.

                    This would be similar to what AES67 did for Ravenna and Dante. Because right now, I feel you either mix for the different formats separately, you use JJ’s approach or you deal with generic up-mixers. And I’m not sure if any of them are 100% ideal. Also, if there was a standard way of describing immersive audio, it would fullfil the BS2051 goal which is to make sure it will be possible to read a “master” many years from now. We can only do that with an open format, don’t you think?

                    • February 26, 2023 at 12:54 pm #5003
                      Thomas Lund
                      Participant

                        I seem to be hitting the wrong reply key, see below 🙂

                  • February 22, 2023 at 3:20 pm #4980
                    James Johnston
                    Participant

                      I must say I prefer to use standards that avoid unnecessary assumptions or force a perceptually less-than-optimum approach on users.

                       

                      In a word where processors can be programmed in (literal) milliseconds, the standard needs to be “load your algorithm here”.

                       

                      • February 26, 2023 at 12:53 pm #5002
                        Thomas Lund
                        Participant

                          Hi Robin, I agree BS.2051-3 is too wide, e.g. allowing for too steep top layer elevations, but at least it’s something to compare arbitrary, virtual processors against. From an “immersive” quality/monitoring perspective, I think only an ideal setup is relevant, i.e. equidistant with limited reproduction room contribution.

                          I’m with you on open format as well, ideally backed by AES, Japanese, Chinese, EBU and other pro institutions.

                      • February 25, 2023 at 2:30 am #4988
                        Thomas Lund
                        Participant

                          The most important is perceptual evaluation with time, not what theory predicts a human to hear. In this case, BS.1116 is also relevant for defining basic listening conditions, though it is time for an update. Considering “immersive”, there are good reasons for reducing influence of the reproduction room, beyond requirements of the current standard.

                          • February 25, 2023 at 10:56 am #4996
                            Bob Katz
                            Keymaster

                              Is there room for BS.1116 discussion at the AES in May in Finland?

                              • March 4, 2023 at 1:54 am #5083
                                James Johnston
                                Participant

                                  Indeed. Perceptual evaluation is ***ALL*** in this regard.

                                  I’ve heard arguments about ‘accuracy’ and ‘measurement’ but there are only two rules I think are important.

                                  The first is “never, EVER, break the illusion”
                                  The second is “don’t require separate mixing and production for each format”.

                                  I suppose a third rule of the inquisition two is “You must be able to mix house mikes, spot mikes, directs, etc, all convincingly” but that does actually come into Rule #1.

                                   

                                  As to “description” that is totally inside each and every engine. R theta phi (distance angle elevation) and XYX (position in relative space) both work.  When you’re doing anything involving motion (either listener head motion or source motion) smoothing and absolute lack of palpable filtering and/or glitching are required.  But that goes right back again to rule 1, which is DO NOT BREAK THE ILLUSION.

                            • February 25, 2023 at 11:37 am #5000
                              Bob Olhsson
                              Moderator

                                JJ’s approach is the only thing that makes any sense to me. Everything else feels over-thunk and meaningless.

                                In the real world, almost nobody ever sits in a “sweet spot” at home or in a theater. Every immersive system tries to create fixed sound source locations. It’s uncalibrated and meaningless in the home unless it’s only used for reverberation. In music, musical translation is king because that allows emotional communication. To this day, when I switch popular music to mono without saying what I did, most people and virtually all musicians will tell me that I improved the mix.

                                • February 26, 2023 at 9:25 am #5001
                                  Thomas Lund
                                  Participant

                                    Hi Bob K, the convention program isn’t public yet, but we could at least discuss BS.1116 in a mutual session. I agree AES is a good place for that, and EBU will be represented as well.

                                • February 26, 2023 at 8:47 pm #5006
                                  Len Moskowitz
                                  Participant

                                    (I’ll continue my theme of higher-order ambisonics for immersive audio that I started over in the microphone forum.)

                                    Once you have a recording in B-format, it can be decoded for an essentially unlimited number of playback configurations.

                                    So a single higher-order B-format recording can be decoded to mono, stereo, 5.1, 7.1, 7.1.4, 22.x.y, any number of concentric rings of speakers to cover the complete immersive sphere, plus headtracked and fixed-head binaural for playback using VR visors and headphones.

                                    The B-format recording can be the result of a live recording, or of a studio process of creating audio tracks from sources and then placing them in the spherical space. Once they’re in B-format, the decoding for playback is a completely independent process that can be done as many times as you want. All of them come from a single B-format recording.

                                    If you want a sweet spot that serves all listener locations and orientations, use a VR visor that provides headtracked binaural playback. Or more simply, headtracked headphones.

                                     

                                    • March 4, 2023 at 1:57 am #5084
                                      James Johnston
                                      Participant

                                        1116 or MUSHRA are both flawed in some sense, because of the difficulty of comparison. Yes, one can use “binaural head” and such, if there ever WAS a “real soundfield”.

                                        In the modern day, that’s a very limited thing, and often not what one wants in an immersive “larger than life” setting.

                                        You can capture speaker rendering in binaural and compare it to virtualized, but that’s only comparing the virtualizer.

                                        The problem is quite difficult.

                                  You must be logged in to reply to this topic.