1116 or MUSHRA are both flawed in some sense, because of the difficulty of comparison. Yes, one can use “binaural head” and such, if there ever WAS a “real soundfield”.
In the modern day, that’s a very limited thing, and often not what one wants in an immersive “larger than life” setting.
You can capture speaker rendering in binaural and compare it to virtualized, but that’s only comparing the virtualizer.
The problem is quite difficult.