Teaching a New Dog Some Old Sound Tricks
By Kevin A. McGrail
Google today announced an update to allow Google Voice administrators the ability to upload pre-recorded customized audio prompts and greetings to the Google Voice automated attendant, instead of using the text-to-speech system (1).
I’m not an audiophile, but this update reminded me of two sound tricks I wanted to share.
Trick #1 – Clipping & Whispering
I’ve spoken about this trick before, but it’s pretty simple. When you are talking to your Google Home or using the speech-to-text function, a pro tip is to lower your voice, sometimes even to a whisper. Why? Clipping.
Clipping is the distortion caused when you overload a sound amplifier. Instead of getting nice sine waves, your sound is “clipped”, and becomes closer to a square wave (2).
Google’s AI magic is pretty amazing, but it can’t translate garbled sound. Since people tend to get louder in an attempt to make the system understand them, the clipping in turn becomes worse. Try this simple whispering trick and I think you’ll be amazed.
Trick #2 – Downsampling for Better Voicemail & Attendant Messages
When I learned this trick, I learned a lot about telephony. I thought others might enjoy the information – however – just a warning that it’s yet to be seen if this will apply to Google Voice.
Google will soon let administrators upload files for the prompts and greetings in Google Voice and it looks like Google Voice will support MP3 and WAV format files. Both of these are pretty good quality. Maybe not the best quality if you are a purist, but definitely better than the sound quality we typically experience on our telephone.
So if you use a recorded sound file for your greetings or prompts, you might find it’s too good for the telephone. Why? Analog telephones use 8kHz of bandwidth. That limit was chosen as a good balance for sound quality and performance for normal talking.
If you record your prompts and greetings on your computer, you will likely get an audio file at 44.1kHz or 48kHz. For simplicity, I will describe this as possibly too high a sample rate for the intended use. Plus, the file is likely to be in stereo and possibly in MP3. Ideally, you want to keep it in an uncompressed format like WAV as long as you can and you only need one channel of sound.
Compression, like that used in MP3 files, creates artifacts. Worse yet, multiple rounds of compression will cause those artifacts to grow. You can also expect that some phone systems will recompress things too for their storage. So if you can, record at 48kHz, mono, 16-bit, downsample it as needed and then upload it uncompressed. Wait, what is downsampling you ask?
Good question! Again, for simplicity, downsampling is taking that higher quality, 48kHz sound file and lowering the sample rate. The goal here is to downsample to the limits of analog lines prior to uploading the prompts and greetings. You can use free software like Audacity to downsample your file to 8kHz, 8 bits, mono sound. When the system goes live, I hope to experiment and see how it works in the real world. I’m anticipating that something like 11kHz, 16 bit, mono audio files will be best. The goal is to find a good, balanced downsampling technique that sounds good for people using Google Voice with high definition sound all the way to those calling in from a pay phone (3).
I was taught this downsampling trick as a key secret that professional voiceover people would use to up the “quality” of the overall experience for their customers. So while you might think downsampling is bad, in the real world environment, it can be amazing. I first used this trick on my old Nortel phone system some 20 years ago. After downsampling the messages before uploading them, people started asking us who we hired to do our recordings. The difference was that dramatic.
If you are curious, there is plenty of information on the internet about why Bell chose 8kHz and it actually makes a lot of sense. Another story I love is about Comfort Noise. Did you know that phones were too quiet so they invented a noise to fill the silence? You can read more about that on this Wikipedia article about comfort noise.
Now some of you might be saying, what about high definition callers? The downsampling is about getting the best quality for everyone since you never know when some of the callers are going to be affected by that 8kHz. For that, I’m really hoping Google will accept uncompressed, 48kHz, 16-bit, stereo greetings and prompts, store them with minimal compression and present them as needed, on the fly for each caller. Then you’ll be wondering why I wasted your time with this trick. But I don’t think Google will always be able to know the quality level available to the caller so it might be best to play it safe.
Only time will tell if this old dog can teach the new dog a trick or two.
(2) Clipping image from: https://www.mtx.com/library-clipping
(3) Do pay phones still exist?