In terms of formal comparisons between CGMs, MARD is the only numerical measure we have. But often people will note things along the line of “well that is great in the laboratory, but unfortunately I do not live in a lab.” Just wanted to note that formal research on MARD absolutely backs up your instinct on this. MARD is a very poor predictor of real world performance. It is an average, and does not allow for the difference and significance of different types of errors. In addition, different (valid) methods of conducting MARD studies will produce very different results. It can be gamed. Think of EPA MPG ratings on cars compared to actual mileage in real world driving. MARD is nowhere near that good a measure. Here is small sample of the articles out there on this.
Link: https://journals.sagepub.com/doi/pdf/10.1177/1932296816662047