If our job is to lead people in singing, then the voice is the most important source in the mix. We’ll suffer through mediocre floor tom tuning or a few missed beats in a clarinet solo, but if the vocals don’t sound right it’s game over.
Our ears and our brains are engineered for understanding spoken words, particularly consonants which are spoken at half the volume of vowels. In the consonant range we’re able to detect sounds 100 times quieter than other sounds and our sensitivity to pitch is 10 times more precise. This is why we have a strong negative reaction to a flat vocal but can barely tell if a bass guitar has been tuned in the last year.
We also know how voices should sound because we listen to them for 12 hours a day! Most of your friends can’t tell if the acoustic guitar was a little too bright but they absolutely know if they can’t understand the words. And they’re right! In this instance, as a sound person, you really are mixing in a room full of experts! And, because you need to be a chief expert among experts, I’m going to share with you some of what I’ve learned about the voice.
Voices have a lot of bass frequency energy. The fundamental frequencies of the voice – the frequencies of the pitches themselves – can be as low 100 Hz for men and as high as 1000 Hz (1 kHz) for that back row soprano. Lower frequencies carry more energy than higher frequencies and when they are too loud they can overpower the intelligibility range. This kind of interference is called “masking” and it has the potential to render a voice completely unintelligible.
We need to pay special attention to these low-frequencies in live sound where we use cardioid microphones. A cardioid microphone is designed to reject sound from behind and beside the capsule. This narrow pattern is indispensable on a loud stage, but it comes with an unwanted side effect. As a cardioid mic gets closer to its source it disproportionately boosts low frequencies. This is called the “proximity effect”. The Shure Beta 87A, a popular supercardioid vocal mic, has a proximity effect boost of up to 8 dB at 125 Hz! That much bass can be a challenge, especially on deep male voices.
Another vocal villain is what’s called a “plosive“. A plosive is a “popping” sound produced by the burst of air let loose by a “p” or “b” consonant. Plosives and other unwanted low frequency sounds take place below 80 Hz. This range is usually rolled off steeply with a high-pass filter (HPF).
The 125-500 Hz range is responsible for “warmth“. “Warmth” is a pleasing sound that makes you feel like you’re close to the singer, snuggled up together singing Christmas carols by the fireplace. But “warmth” has an evil twin called “mud” that lives in the same part of town. A “muddy” vocal has too much low midrange and becomes difficult to understand. This is a result of the masking effect we mentioned earlier.
The midrange of a voice is nuanced and complex. This is where we find the consonants, each at a different frequency and unique for each voice. If a voice sounds harsh or “piercing“, it may benefit from a cut around 2-3 kHz, which I lovingly refer to as the annoyance range. “Presence” is found from 4-8 kHz and is usually what makes a voice stand out or cut through a mix. You may find a lead vocal has more “presence” than a backing vocal. Mastering the midrange of a lead vocal takes time. Careful sweeping through this frequency range is well worth the effort!
The treble frequencies include sounds like “air” and “sibilance“. “Air” is an appropriate name for the sound of the singer’s breath across the microphone. You can hear it on modern pop recordings, especially female voices, on breathy consonants like “f” and “h” and on John Mayer’s voice literally all the time. This sound may be found at 8 kHz and higher, and can add clarity and make a voice sound like it’s up close.
“Sibilance” is the harsh burst of sound that can accompany an “s” or “t” consonant. It’s not an microphone artifact, it comes from the restriction of air at the mouth. Sibilance originates in the upper midrange around 5-8 kHz but often continues up much higher. It’s common to cut treble to reduce sibilance, but I prefer to get at it with a high-mid control if I can, leaving the high-end more intact.
Sibilance is often made louder by the use of audio compression. Compressors work by reducing gain by a certain ratio, like 4:1, then boosting the overall gain to compensate. The end result is a more even volume level. But sibilance is a faster transient than a compressor is normally set for, so it sneaks through the gain-reduction stage unaffected. Then, when the compressor adds makeup gain at the end, the sibilance is boosted too so it ends up at a higher volume than the rest of the signal! For more info on audio compression, check out this Compressor Primer.
It can be a challenge to balance all these sounds and still have a natural sounding voice in the mix. Because there’s so much nuance to a voice, it helps to know how the person sounds naturally, up close, without a mic. Aim to preserve that character in the mix and you’ll be off to a good start!