Links on SoundGuys may earn us a commission. Learn more.
How to edit your voice
So you’re looking to podcast, or become the next YouTube star—or maybe you’re just looking to record yourself for your own private use. You’ve got all your equipment, but your voice recording doesn’t seem quite right.
Here’s a quick, broad-strokes primer on how to edit your own voice for public consumption. This guide assumes that you have the proper equipment set up, and the proper type of microphone for your purposes.
Editor’s note: this article was updated on December 14, 2022 to address recent advances in denoising software.
A good recording environment is the best way to get good audio
It may sound obvious, but making sure your environment is noise (and echo) free is the best way to set up a good recording. Not only will you avoid having to edit out interstitial noise and sounds, but you’ll be able to avoid impossible-to-kill echoes and anything else you find yourself getting annoyed at.
If you find your audio isn’t all that great, here’s what you should do in descending order of importance:
- Be sure you have a pop filter in place to prevent near-mic issues
- Ensure you have the correct equipment, and the correct level of power in your equipment
- Treat the room for echoes, or get a mic shield
- Pick the right Digital Audio Workstation (DAW)
- Be sure you understand how to place your mic
It’s that last one that will be a little difficult, but we can help you get through it easily.
Pick the right DAW
A DAW is a program that will handle all your audio editing for you. There are plenty on the market now, but you will likely want to pay for one. I say this because the best DAWs are the ones that layer effects on top of sound, rather than alter the original file.
Audacity is a great free tool if all you're doing is capturing a basic recording, but the effects it uses will change the original file.
For example, Audacity is a great free tool if all you’re doing is capturing a basic recording, but the effects it uses will change the original file. Adobe’s Audition on the other hand, will only layer effects on top of the original file, allowing you to change your recording’s effects to suit your needs later.
There are plenty of DAW applications out there for different uses, so test out some free trials and see which ones you like. Reaper, Pro Tools, and Audition will all set you back some cash, but what you get in return is outstanding. Free apps like Audacity have very limited uses, but can help in a pinch. You may even get the software bundled with your equipment—for example, the Scarlett 2i2 comes bundled with Ableton Live! and Pro Tools.
Be sure to leave long periods of silence after starting a voice recording
While this may sound weird, you should always leave a bunch of dead air at the beginning and end of your audio. This way, you can improve your audio in several ways that you wouldn’t otherwise be able to do.
Did you leave a lot of dead air time at the beginning of your recording? Good!
Leaving this dead air will also allow you to figure out important things, like how much noise you need to remove, and how to set the limit on what gets removed from your voice recording. In general, you should never apply any more noise reduction than you need, and recording the dead air will allow your DAW to find that magic number quickly.
See where there are noise peaks on the voice recording, and then turn down the level until there’s no more (or very little) noise.
Many DAWs will have a de-noise option where you can select sections of nothing but noise to train the program to remove the right stuff. The cleaner the sample of noise is, the better it’s removed from the final recording. You should aim to have at least 5-10 seconds of noise to give the DAW, but that’s probably a little overkill. The DAW will then remove the noise based on the patterns you’ve given it, removing crackles and hisses from your file even when there’s other sound present.
There also exist newer de-noising techniques that have recently come to prominence that use machine learning and GPU hardware in order to perform de-noise processing. This includes technologies like NVIDIA RTX Voice and Krisp (which is already in use in Discord). While these aren’t available as VSTs for use in DAWs, the technology is gaining more prominence. There are also VSTs like iZotope’s RX 10 Voice De-noise which is currently the industry standard for de-noising outside of built-in DAW techniques.
Use the compressor
Unlike your brain, your recording equipment will likely not know how to interpret wild swings in volume. Consequently, you’re going to want to use what’s called a “compressor” to help even these out. A compressor can be used to prevent clipping in samples, as well as bring up the levels of quieter sounds in your voice recording. This is accomplished by reducing the dynamic range of a track in a way that’s not going to damage the quality (much). As always, record with the highest bit depth you can. It’s also important that if you intend to use a compressor then you should also consider using a gate. This ensures that you’re not compressing and bringing up the noise floor of your recording in addition to your voice.
See those really high peaks in your recording? If they get too high, they will exceed the allotted values for loudness in your track. When that happens, the playback will get super loud, and very noisy: it’ll sound like crap.
To avoid this, use the compressor (or sometimes called “speech volume leveler”) to bring the peaks of your voice below the clipping point, and the quietest parts up to an audible level.
The threshold is the point where you want the compressor to start bringing your peaks under control. Only sounds above that level will be compressed. You may want to play around with this a bit, but in general I use -6dB so I have more freedom to laugh on mic. There’s no right or wrong answer here, but you’re going to have to listen to your voice over and over to see what makes sense for you.
This setting tells the DAW how fast to apply and how long it should take to stop the compressor effect. This can get a bit sticky, but ultimately we’re talking about milliseconds here. Play around with this, but be aware that you may want to cut up your speech track a bit if you want to use different measures. A slow attack time will run the risk of clipping, but it gives instruments like the snare drum more punch. A fast attack time might seem jarring if you were expecting a louder signal, and so on. Usually the system defaults will be fine.
Compressing your audio does not mean it’s a hard limit. Instead, the compressor applies a reduction ratio to the signals in order to maintain some dynamics. The ratio you select will be applied to only signals above the threshold discussed earlier. The greater the ratio you select, the more aggressively the compressor will reduce the peak levels. The lower the ratio you select will affect your audio less. If you have very strong peaks, you’re going to want more aggressive ratios to normalize your levels. If you’re having trouble visualizing that, here’s a better representation of what I mean.
Hopefully that clears it up.
Use the de-esser
If you’re like me, proximity to your microphone means very sibilant vocals. If you find your s, sh, f, and other high-frequency sounds are really grating in your voice recording: use the de-esser. This is a tool that will take a certain range of frequencies that make up these sounds and reduce their loudness. You can also achieve a similar result using an equalizer, but in that case you’ll have to select the ranges manually.
High-pass filters are your friend
While it may seem counter-intuitive, bassier individuals will probably want to use what’s called a “high-pass” filter. This filter sets a certain frequency of sound to dampen or mute everything lower-pitched than the set threshold.
You may like the rich sounds of your voice, but listeners in a car or using bassy headphones will just tire of it really quickly. Use the high-pass filter to deaden everything under 50Hz, maybe a little higher.
Don’t just cut, crossfade
Instead of cutting up a track with the razor tool, consider using a cut and crossfade instead. This will reduce the power of the first segment and blend it seamlessly into the second, meaning no jarring transitions in your voice recording from one second to the next.
Use a light hand
Finally, keep a light hand when you’re applying edits. You’re going to want to make everything perfect right off the bat, but it’s very common to make your voice sound crappier than it did to begin with during your pursuit of perfection. Sometimes a little bit of noise is preferable to a distorted and compressed voice in your final product.
Going too crazy with the effects can really torpedo your mix, so just take a light hand, eh? You’ll thank yourself later, especially if you use a DAW that alters the original file.
How to tell when you’re done
You know you’re done editing your voice recording when it sounds good on its own. Turn it up loud, make it quiet, play it over and over. If you still don’t know if it’s ready, it’s ready. Really, you’re just trying to make it sound as free of errors as possible.
It’ll be obvious if your voice recording isn’t ready for primetime. Luckily, you’re not recording an orchestra: just one voice.