Streaming Audio   -   Streaming Video


Back to Solutions Main Page




Site Map

Multimedia - Streaming Audio


MPEG-2 AAC  |  MP3  |  MPEG-4 Audio


    MPEG-2 Advanced Audio Coding (AAC)


Also known as MPEG-2 NBC, AAC represents the actual state of the art in audio coding. It is able to include up to 48 audio channels, 15 low frequency enhancement channels, 15 embedded data streams and has multi-language capability. It also offers a better compression ratio than layer-3. MPEG formal listening tests have demonstrated that it is able to provide slightly better audio quality at 96 kb/s than layer-3 at 128 kb/s or layer-2 at 192 kb/s. AAC offers a data reduction by a factor of 16 while maintaining CD quality.

The appropriate incorporation of high coding gain and great flexibility opens up a wide field of applications. With sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48, the method is well prepared for future developments in the audio sector. Compared to well-known coding methods such as MPEG-2 Layer-2, it is possible to achieve half the bit rate with no loss of subjective quality.

The driving force to develop AAC was the quest for an efficient coding method for surround signals, like 5-channel signals (left, right, center, left-surround, right-surround); as being used in cinemas today. There have been algorithms for these signals in MPEG-2 for quite a while. Optimum efficiency, however, was not reached due to technical and historical reasons. Therefore, the set aim was a considerable decrease of necessary bit ate.

Features of MPEG-2 AAC

  1. High compression performance is achieved.
  2. Flexibility of encoding and decoding complexity, e.g., different spatial resolution, temporal resolution, and quality enables very flexible trade-off between quality, performance and cost.
  3. Object-based coding functionalities allow for interaction with audio-visual objects and enable new interactive applications in a mobile environment.
  4. It uses for compression:
    • Huffman coding
    • quantization and scaling
    • M/S matrixing
    • intensity stereo
    • coupling channel
    • backward adaptive prediction
    • temporal noise shaping (TNS)
    • modified discrete cosine transform (IMDCT)
    • gain control and hybrid filter bank (polyphase quadrature filter (IPQF)+IMDCT)
  5. The most important new tool used is the backward adaptive prediction which uses about 45% of the decoding time. There is also a low complexity profile:
    • no prediction
    • TNS limited to 12 coefficients, but still over an 18 kHz bandwidth.
  6. And a scaleable sampling rate profile:
    • no prediction
    • no coupling channel
    • gain control
    • Hybrid Filter Bank (IPQF + divided IMDCT)
    • TNS is limited to 12 coefficients , and is limited to 6 kHz bandwidth

Configuration of MPEG- 2 AAC

  • Scalable coding of speech and music for different transmission methods, such as Internet and Digital Broadcasting.
  • Multimedia streams control other multimedia: Control audio and streams (synchronization and switching, etc.)
  • Presentation control: Control the display, audio and other presentable output
  • User interface control: Interface with user
  • Graphics composition and control: Control object placement, transparency effect, and its user interaction
  • Return channel management: Control return channel (method, data rate, protocols etc.)
  • Conditional access management: Entitlement management and control
  • EPG display and control: Electronic program guide management and display
  • Profile management: Profile of the client for selective adaptation
  • Resource management: Resources available at the client (e.g. storage, digital interface, and other peripheral)
  • Diversity of mobile devices (e.g. PDA, sub-notebooks, notebooks, or portable workstations) in regard to available resources.
  • Diversity of wireless networks (e.g. HIPERLAN, GSM, UMTS, or satellite) in regard to network topology, protocols, bandwidth, reliability etc.

Applications for MPEG-2 AAC

  • Broadcast
  • Content based Storage and Retrieval
  • Digital AM Broadcasting
  • Digital Television Set-Top Box and DVD
  • Infotainment
  • Mobile Multimedia
  • Real Time Communications
  • Streaming Audio-Video on the Internet / Intranet
  • Studio and Television Post-production
  • Surveillance and Virtual Meeting
  • Delivery of audio for wireless distribution - via 3G or Bluetooth.

Due to its high coding efficiency, AAC is a prime candidate for any digital broadcasting system. AAC has been selected for the use within the DRM system. The Digital Radio Mondiale (DRM) is a world consortium dedicated to forming a single world standard for digital broadcasting in the AM radio bands below 30 MHz. Due to its superior performance, AAC will also play a major role for the delivery of high-quality music via the Internet.


    MPEG Audio Layer-3


In 1987, the ISO / IEC devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 (ISO 13818-3).

By using MPEG audio coding, one can shrink down the original sound data from a CD by a factor of 12, without losing sound quality. Factors of 24 and even more still maintain a sound quality that is significantly better than what you get by just reducing the sampling rate and the resolution of your samples. Basically, this is realized by perceptual coding techniques addressing the perception of sound waves by the human ear.

By exploiting stereo effects and by limiting the audio bandwidth, the coding schemes may achieve an acceptable sound quality at even lower bitrates. MPEG Layer-3 is one of the most powerful member of the MPEG audio coding family. For a given sound quality level, it requires the lowest bitrate - or for a given bitrate, it achieves the highest sound quality.

Using MPEG audio, one may achieve a typical data reduction of

1:4  by Layer 1 (corresponds with 384 kbps for a stereo signal)
1:6...1:8  by Layer 2 (corresponds with 256..192 kbps for a stereo signal)
 1:10...1:12   by Layer 3 (corresponds with 128..112 kbps for a stereo signal)  

still maintaining the original CD sound quality. For the use of low bit-rate audio coding schemes in broadcast applications at bitrates of 60 kbit/s per audio channel, the ITU-R recommends MPEG Layer-3. (ITU-R doc. BS.1115)

sound quality bandwidth mode bitrate reduction ratio
 telephone sound  2.5 kHz  mono  8 kbps  96:1
 better than short-wave    4.5 kHz  mono  16 kbps  48:1
 better than AM radio  7.5 kHz  mono  32 kbps  24:1
 similar to FM radio  11 kHz  stereo  56...64 kbps  26...24:1
 near-CD  15 kHz  stereo  96 kbps  16:1
 CD  >15 kHz  stereo    112..128kbps    14..12:1

Features of MP3:

Major enhancements over the Layer I and Layer II algorithms include:

  • Alias reduction - Layer III specifies a method of processing the MDCT values to remove some redundancy caused by the overlapping bands of the Layer I and Layer II filter bank.
  • Nonuniform quantization - The Layer III quantizer raises its input to the 3/4 power before quantization to provide a more consistent signal-to-noise ratio over the range of quantizer values. The requantizer in the MPEG/audio decoder re-linearizes the values by raising its output to the 4/3 power.
  • Entropy coding of data values - Layer III uses Huffman codes to encode the quantized samples for better data compression.
  • Use of a "bit reservoir" - The design of the Layer III bit stream better fits the variable length nature of the compressed data. As with Layer II, Layer III processes the audio data in frames of 1,152 samples. Unlike Layer II, the coded data representing these samples does not necessarily fit into a fixed-length frame in the code bit stream. The encoder can donate bits to or borrow bits from the reservoir when appropriate.
  • Noise allocation instead of bit allocation - The bit allocation process used by Layers I and II only approximates the amount of noise caused by quantization to a given number of bits. The Layer III encoder uses a noise allocation iteration loop. In this loop, the quantizers are varied in an orderly way, and the resulting quantization noise is actually calculated and specifically allocated to each subband.

Configuration of MP3:

  • Designed as an adaptive representation scheme that also accommodates very low bitrate applications, is very appropriate for mobile multimedia applications.
  • A complement of services over a fixed unidirectional communication channel.
  • The system can be configured of a single (logical) origination point, a real-time, unidirectional communication channel and a large number of end-user receiver/decoder terminals. It is a one-to-many, or possibly a few-to-many system.
  • The need of being able to trade-off between quality, performance and cost.

Applications of MP3:

  • Digital Audio Broadcasting (EUREKA DAB, WorldSpace, ARIB, DRM)
  • ISDN transmission for broadcast contribution and distribution purposes
  • Archival storage for broadcasting
  • Accompanying audio for digital TV (DVB, Video CD, ARIB)
  • Internet streaming (Microsoft Netshow, Apple Quicktime)
  • Portable audio (mpman, mplayer3, Rio, Lyra, YEPP and others)
  • Storage and exchange of music files on computers

Performance metrics:   MP3 Decoder

Analog Devices' MPEG1 Layer 3 (MP3) multi-channel audio decoder reference design implements the digital audio decode in real-time on the ADSST-2185M 16-bit fixed-point Digital Signal Processor (DSP). The chipset decodes primary and extended streams for Layer 3 of the MPEG1 standard. The MPEG1 audio decoder fully complies with the IEEE 1172/1173 audio standard. The decoder runs completely within the internal RAM of the DSP and is implemented in 36 MIPS.

Processor:   ADSP - 2185M (75 MHz)

 MIPS  36
 PM  25K
 DM  23K

Processor:   Proprietary SIMD DSP core at 150 MHz

 MIPS  22
 PM  24K
 DM  22K


    MPEG-4 Audio

MPEG-4 is an ISO/IEC standard 14496 developed by MPEG (Moving Picture Experts Group), which builds on the proven success of three fields:

  • Digital television
  • Interactive graphics applications (synthetic content)
  • Interactive multimedia (World Wide Web, distribution of and access to content)

MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the three fields.
AAC, also known as Advanced Audio Coding is possibly the strongest contender to upset MP3. AAC takes advantage of the best features of MPEG-2 and AT&T's Perceptual Audio Coder (PAC). Impartial labs have tested AAC and consider it to be of high quality. It requires a lower bandwidth than MP3 (64 kbps / channel), but a typical implementation requires 30 to 40% more MIPS than MP3.


MPEG-4 Audio facilitates a wide variety of applications which could range from intelligible speech to high quality multichannel audio, and from natural sounds to synthesized sounds. In particular, it supports the highly efficient representation of audio objects consisting of:

  • Speech signals: Speech coding can be done using bitrates from 2 kbit/s up to 24 kbit/s using the speech coding tools.
  • Synthesized Speech: Scalable TTS coders bitrate range from 200 bit/s to 1.2 Kbit/s which allows a text, or a text with prosodic parameters (pitch contour, phoneme duration, and so on), as its inputs to generate intelligible synthetic speech.
  • General audio signals: Support for coding general audio ranging from very low bitrates up to high quality is provided by transform coding techniques. With this functionality, a wide range of bitrates and bandwidths is covered. It starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz but also includes broadcast quality audio from mono up to multichannel. Furthermore, AAC (with some modifications) is the only high-quality audio coding scheme used within the MPEG-4 general audio standard, the future "global multimedia language". Due to its high coding efficiency, AAC is a prime candidate for any digital broadcasting system.
  • Synthesized Audio: Synthetic Audio support is provided by a Structured Audio Decoder implementation that allows the application of score-based control information to musical instruments described in a special language.
  • Bounded - complexity Synthetic Audio: This is provided by a Structured Audio Decoder implementation that allows the processing of a standardized wavetable format.

Features of MPEG-4 Audio

  • It supports high performance data compression.
  • A trade-off between quality and performance can be made by scaling encoder and decoder complexity, spatial resolution, temporal resolution, and quality.
  • Content-based coding enables interactivity with objects.
  • Additional functionality like speed control and pitch change for speech signals
  • Additional functionality like scalability in terms of bitrate, bandwidth, error robustness, complexity, etc.
  • Composition interactivity, Objects synchronization, and Improved coding efficiency.
  • Improved temporal random access, Content-based scalability, Auxiliary data capability
  • Compatibility with MPEG-2 standard
  • Copy protection and User interaction
  • Downloading of audio-visual objects and other information data.
  • Multipoint operation, Robustness to information error and loss
  • Coding of multiple concurrent data streams

Configuration of MPEG-4 Audio

MPEG-4 Audio provides several "profiles" to allow the optimal use of MPEG-4 in different applications. At the same time the number of profiles is kept as low as possible in order to maintain maximum interoperability. MPEG-4 offers the following profiles:

  • The Speech Audio Profile provides a parametric speech coder, a CELP speech coder and a Text-To-Speech interface.
  • The Synthesis Audio Profile provides the capability to generate sound and speech at very low bitrates.
  • The Scalable Audio Profile, a superset of the Speech Profile, is suitable for scalable coding of speech and music and for different transmission methods, such as Internet and Digital Broadcasting.
  • The Main Audio Profile is a rich superset of the three previous profiles (scalable, speech, synthesis) containing tools for both natural and synthetic audio.
  • The High Quality Audio Profile contains the CELP speech coder and the Low Complexity AAC coder including Long Term Prediction. Scalable coding can be performed by the AAC Scalable coder. Optionally, the error resilient bitstream syntax may be used.
  • The Low Delay Audio Profile contains the parametric and CELP speech coders (optionally using the error resilient bitstream syntax), the Low Delay AAC coder and the Text-to-Speech interface.
  • The Natural Audio Profile contains all natural audio coding tools available in MPEG-4, but not the synthetic ones.
  • The Mobile Audio Internetworking Profile contains the low delay and scalable AAC object types including TwinVQ and BSAC. This profile is intended to extend communication applications using non-MPEG speech coding algorithms with high quality audio coding capabilities.

Applications of MPEG-4 Audio

  • Broadcast
  • Content based Storage and Retrieval
  • Digital AM Broadcasting
  • Digital Television Set-Top Box and DVD
  • Infotainment
  • Mobile Multimedia
  • Real Time Communications
  • Streaming Audio-Video on the Internet / Intranet
  • Studio and Television Post-production
  • Surveillance and Virtual Meeting
  • Delivery of audio for wireless distribution - via 3G or Bluetooth.

Performance metrics:   MPEG-4 AAC decoder

Processor:   ADSP-2189 at 75 MHz

 MIPS  48
 PM  30 KW
 DM  29 KW

Processor:   Proprietary SIMD DSP core

 MIPS  38
 PM  27 KW
 DM  28 KW

top             Nuntius Logo

Home Page   |   Company   |   Solutions   |   Technology   |   Employment   |   Contact   |   Site Map