When you look at any review online, the first piece of information you’re introduced to is usually the reviewer’s score. But does that tell you anything? Well, it does if you’re on SoundGuys! We have a fairly unique way of approaching how we assess and quantify just how good or bad a product is.
We want readers to be able to compare scores from product to product to product and be able to discern as much information as possible from them. What are they good at? What do they suck at? Is this battery better than that one? What if I’m looking at another product type?
While it’s tempting to score each product category by its own metrics, we opted to score every product with the same equations for the same scores. That way, no matter what you look at: you’ll always be able to tell which battery, sound quality, etc. is better by looking at the scores. It may not be the first choice of everyone, but we want our scores to matter, and not be the result of one of our reviewers picking a number without any process behind it. Our company has a draconian ethics policy, and we want that to bleed through to how we present our reviews.
Because we need to quantify each scoring metric, we collect objective data from our products wherever possible. We strive to remove humans from the equation as much as possible where it’s appropriate so that there’s less possibility for error, and no room for biases. We also want to score based on the limits of human perception (or typical use) rather than what’s “best” at any given moment. Because of this, our scores are generally lower than most outlets’ assessments. We lay everything bare, and even the best products typically have their flaws. You deserve to know about them.
How we quantify battery life
Battery life is a strange thing to have to quantify, as what people call “good” varies from product to product. Because we don’t want to unfairly punish something for meeting most peoples’ needs—while also not unfairly rewarding something with a battery life that’s slightly better than others—we came up with an exponential equation to quantify our scores.
In order to figure out what scores we should give to tested products, we first needed a little information. When we test, we measure and play back sound for our battery test at the same volume across all models so we’re comparing apples to apples. While our test level is fairly high for most people, it’ll give a lot of people information they’re looking for.
We need to know how long most people listen to their headphones on a day-to-day basis. We posted a poll to twitter to gather results, and 5,120 people responded:
Music lovers: how long do you listen to your headphones each day?
— Android Authority (@AndroidAuth) August 7, 2018
Okay, so around 75% of respondents listen to music for three hours or less every day. Because we want to grab more people without setting an unrealistic goal, we set our midpoint for scoring at 4 hours. That means, if something lasts 4 hours in our battery test: we give it a score of 5/10.
We already knew most people sleep at night (and can therefore charge their headphones while they sleep) so if a product exceeds 20 hours, its increase in score starts to exponentially diminish as it approaches the maximum possible score. The difference between a battery life of 24 hours and 23 hours isn’t as big as the difference between a battery life of 2 and 3 hours, for example. Any products that can last over 20 hours will score in between 9/10 and 10/10.
Obviously, some product categories will score better or worse here, and we’re fine with that. By keeping all of our battery scores on the same metric, you can directly compare the scores across any product category.
How we quantify sound quality
By testing several metrics (some that we don’t post in reviews), we’re able to contextualize audio quality with math. While it may not always be the easiest to read, we only weigh results against their most likely tuning standards. Not every set of headphones is trying to sound the same as all the others, so it’s unfair to penalize them for something they’re not trying to do.
There is no one “ideal” response, so we try to give each product the best shot possible at being evaluated against their intended standard. Some of the standards we use include the Olive-Welti curves (in-ear, over-ear, loudspeaker), ISO 226:2003, and a simple studio response. From there, we punish deviations from the target (dotted red line) standards on an exponential scale, and where each deviation happens. Some companies have their own signature sound features like preventing ringing in harmonics and the like, and we don’t want to punish headphones for trying to help their users. However, we do have to draw the line somewhere, and we do punish detrimental swings in emphasis.
How we quantify connection quality
Remember way back when where we measured how well each Bluetooth codec handled audio quality and performance? Well, we were able to take that data and ballpark some scores for it. We obviously reward having more codecs available, and the scores reflect that. However, not all headphones pick more than two (usually SBC and one to two others). We also give the score a boost if wireless headphones offer a wired alternative, and so on.
How we quantify mic quality
Much like we do with sound quality, we measure the performance of the microphones in each product. From that data, we can automate certain scores about the sound quality of the microphone.
By narrowing our parameters to the normal voice band (50-4,000Hz), we can point out where each microphone falls short or excels. In order to demonstrate what each microphone sounds like, we also upload a short clip of us talking into it as proof of our scoring.
How we score isolation and noise canceling
Much like we score the sound of the products we review, we also score how well they destroy (or simply block out) outside noise. We contend this is one of the most important metrics for all people, so a little more work is involved in this score.
Because the lower notes are the most important ones, we weight isolation/attenuation in three tiers: 0-256Hz is weighed the most, followed by 257Hz to 2.04kHz, and then up to 20kHz. We do this because that first category can mask out a good portion of your music, while the highest notes are usually tougher to hear anyway.
After we feed our test data through our scoring program, we’re given a score and a chart to share with you, our readers. Because we don’t alter scores based on product category, sometimes standout products won’t score that high in metrics like isolation, even if they’re the king of their tiny mountain.
Of course, not all scores can be objectively reduced to simple numbers and equations. Things like design features, remote utility, and the like are getting more complicated by the day—so we have a staff of experts to rate these metrics. It’s really no more complicated than that. While we keep this type of score to an absolute minimum, it’s unwise to let it go unaddressed.
After all is said and done, the site’s backend will then average all the scores together and that will be the overall score. I caution everyone to put more effort into finding the individual scores that matter to you instead of relying on one overarching score, but we do attempt to contextualize a product in more ways than one in each review.
You might find that you personally don’t care so much about certain scores that we track. That’s okay! Just be sure to keep tabs on the ones you care about, because we try to make very granular assessments to help people along their personal audio journey.