Mp3 / AAC Audio CODEC Fundamentals

Question:

If a rock and a pin were dropped, and they land on a table at the same time, do they both make a sound?

Answer:

Yes they do.

Can you hear them both?

Probably not…

Why?

Logic would dictate the reason why is because the sound of the rock is so loud that it would “cover” those of the tiny pin. This concept is known as masking. The really loud sound of the loud rock masked the tiny sound that the pin made.

This is a nice introduction to how perceptual audio coding works. In order to make things like podcasting, streaming audio, and internet video possible, we need some technology to make normally huge digital audio files smaller.

How is this done?

It’s pretty complicated, but really interesting once you get the hang of it. Don’t worry. I’ll just explain it with some simple concepts. For those wanting to know exactly what’s happening, you’ll have to do some searching, or wait for me to get around to writing about it! <evil grin>

A linear audio file, (say a .WAV / AIFF file, or the raw data from a music CD), is quite huge. A standard 44.1 kHz 16 bit stereo WAV file is a little over 10 megs of data for each minute of audio.

This WAV file It contains a LOT of audio information a large portion of it, the average person would never notice that it is there.

This is due to the fact that we spend most of our listening time focused on what’s going on in the foreground (the loudest sounds in the recording).

Taking advantage of 'perception'

Mp3, MP4, and other CODECS (enCOder / DECoder) see these “unnoticeable” sounds as something that could potentially be removed. The “hole” left behind by this removal is partially covered over by the loud sounds in the recording. CODECs are "smart" enough to "smooth over" much of what you might consciously hear as missing. This and covering up the missing with the loud can be thought of as "masking".

Keep in mind that what I described in the above paragraph is actually an EXTREMELY complicated process, and would take volumes of documentation to describe the exact details of what is going on. I present the above as an "everyperson's" overview to go with.

Since these CODECS "play" with how we perceive what is missing, and what is not, they have the generic name of "Perceptual CODECs".

Impressive, yet destructive!

The techniques used by .mp3 and AAC are quite effective, and it takes a trained ear to spot the flaws in coded audio. Most MP3 files, for example, are composed of only 10-20% of what was originally in the linear digital audio files!

Unlike Zip or Winzip files, the audio encoding process when using AAC and MP3 is destructive, and permanent! This means that there is no way to get back to the original full quality using the MP3 / AAC encoded copy.

There are encoding methods out there (such as FLAC) that create an additional file that contains a description of what was removed to get the file size down. These specialized CODECs uses this descriptor file to reconstruct the missing audio data at the playback end.

While perceptual CODECs can do their job fairly well, it isn’t perfect by any means. The process of destructive perceptual coding creates unintended side effects (distortion anomalies) to the recovered audio. These side-effects are commonly called “coding artifacts”.

For audible examples of these artifacts, check out the “Listening for coding artifacts” article.

Thanks for reading this article!

Regards,

(Photo by Barry Mishkind)