MPEG-4, or Why Efficiency is Much More Than Just a

the MPEG-4 Standard, however, is much more than just ... MPEG-2 see major advances in compression efficiency for a long time after the ... to the textual VRML download and play format, as well as some other ... interactivity. (The MPEG-J [java] APIs and application .... bitrate version) and let the decoder decide which is the.
641KB taille 10 téléchargements 226 vues
Koenen.qxd

12/18/02

11:26 AM

Page 24

MPEG-4, or Why Efficiency is Much More Than Just a Compression Ratio By R. H. Koenen

MPEG-4 gives a set of tools for the efficient representation of multimedia content, in several respects. The MPEG-4 standard is much more than just a set of audio and video codecs, but rather an entire system that supports this efficient representation. The tools include efficient, binary, realtime language for scene description: Binary Format for Scenes (BIFS). BIFS is designed to allow mixing of media, be they video, graphics, text, or computer-generated content. This means that not all media types need to be treated as moving video pixels, which allows a more efficient representation, with higher quality as a bonus. Similar concepts apply to audio. This paper also explains how scalability options can allow more efficient data representation in many environments. There are good business reasons that make MPEG-4 an efficient choice. Using proprietary formats may seem cost-effective at first glance, but comes with hidden expenses. Open standards, responsibly upgraded, allow significant cost savings, while fostering efficiency-improving innovations. MPEG-2 gives a clear example.

T

he MPEG-4 standard1-7 has been available since the end of the last century, and is starting to be deployed. Much technical debate has focused on compression efficiency of the MPEG-4 video coder and, to a lesser extent, on the efficiency of the audio coder. There is a great deal to be said and written about the efficiency of these coders. The efficiency embodied in the MPEG-4 Standard, however, is much more than just the performance of two of the coders in the framework. This paper explains how MPEG-4 provides the industry with efficiencies in a number of ways: • The object-based scene representation allows more efficient and higher-quality representation of multimedia content (and giving much more flexibility as well). • There are various scalabilities in MPEG-4 that can be exploited. • Open decoder-only standards such as MPEG-4 and MPEG-2 see major advances in compression efficiency for a long time after the standard is frozen, without the need for upgrading decoders. • An open standard means business efficiencies for parties that invest in the MPEG-4 ecosystem.

Efficiency through Object-based Representation MPEG-4 architecture signifies a major paradigm shift from earlier multimedia content representation standards. The heart of this architecture is the objectbased representation of the so-called audiovisual “scene.” The best way to illustrate how this works is by looking at MPEG-2 first. When creating MPEG-2 video content, all the assets (moving video, graphics, text, etc.) are collected, arranged, and then mixed together—collapsed into a single pixel raster. The raster is SMPTE

January 2003 •

Koenen.qxd

12/18/02

11:26 AM

Page 25

MPEG-4, OR WHY EFFICIENCY IS MUCH MORE THAN JUST A COMPRESSION RATIO then encoded using MPEG-2, which essentially treats all the pixels as moving video. This is what MPEG-2 was designed to do. It is, however, not optimal to treat still images and text as moving video pixels; this degrades the quality that could be achieved at the decoder for these data types, and it also affects the quality of the underlying moving video, e.g., in the case of a text overlay that interferes with the film that it was projected on. It is much more efficient to keep the objects separate in the bitstream, each with its own optimal representation (video represented as video, graphics as graphics, and text as text) and do the composition of the objects after they have been decoded. MPEG-4 does just that: the composition does not need to take place before encoding but can happen after decoding. To make the move of the composition phase possible, MPEG-4 includes a binary language for scene description called the Binary Format for Scenes (BIFS). This is a very efficient, binary and realtime language that allows scenes to be described and built on the fly. BIFS commands instruct the compositor where to put the decoded objects and what to do with them. The BIFS format7 was built on the Virtual Reality Modelling Language (VRML).8,9 To create BIFS, MPEG added a binary equivalent and realtime characteristics to the textual VRML download and play format, as well as some other extensions. The explicit scene description not only allows “screen objects” to be kept in an efficient, native representation, but also allows manipulation of the objects on the screen (e.g., a translation or a fade-out) by simple commands instead of expensive pixel-based video coding operations. The BIFS language further allows adding behavior to objects. A graphic could be set to spin with a simple command, and no video information needs to be sent to keep the spinning going. Moreover, the language allows conditional behavior: things that happen to the content objects in response to events. Obvious event types are those that are triggered by user input. This makes the MPEG-4 scene amenable to sophisticated interactivity. (The MPEG-J [java] APIs and application engine, also part of MPEG-4, go even further in allowing interactivity.) While the scene can be built up over considerable time, adding and deleting objects from the scene on the fly, there are provisions for “carrouseling” the BIFS SMPTE

January 2003 •

data, so that it is always possible to tune into an MPEG-4 program at random times without having to wait too long before the scene starts making sense. To this end, there are Intra-BIFS, the equivalent of Iframes in video coding. While the preceding talks about visual data types, the exact same applies to audio information, although it is always a bit harder to “visualize.” Audio objects can be sent separately, and there are different codecs in MPEG-4 for general audio and speech. Audio signals can be given a place in 3-D space using Audio BIFS. The Audio BIFS specification (part of MPEG-4 Systems) even has syntax for defining the space in which the audio is supposed to be reproduced. So the same audio signal sent to a decoder can be made to sound as though it were reproduced in a small closet, on an open field, or in a large conference hall, without having to change the coded audio signal itself. All that is needed are a few simple BIFS instructions.

Efficiency through Scalability There are a few types of scalability in MPEG-4. In general, the goals of scalability are: • To allow a scene to be encoded once, and then be played out at different (and possibly varying) bitrates and configurations, according to specific circumstances. • To allow a decoder that receives a full signal to decode and display only a subset of that signal, e.g., because of compute resource or display constraints. MPEG-4 has provisions for the usual types of scalability in the audio and video bitstreams, as well as flexibility in how the scene is represented, i.e., choices as to which object streams make up the scene.

Stream Scalability Let’s first address the traditional scalability. This resides within the audio or video bitstream. The definition of this type of scalability is that the full signal contains one or more meaningful subsets of bits that can reproduce the signal at a lower quality. MPEG-4 Visual contains tools for temporal, spatial and SNR (quality) scalability. A special case is the so-called FineGranularity Scalability (FGS), which can encode the signal with a base layer and up to 11 enhancement layers, allowing a very smooth transition between different quality levels by adding or deleting layers. FGS has been

Koenen.qxd

12/18/02

11:26 AM

Page 26

MPEG-4, OR WHY EFFICIENCY IS MUCH MORE THAN JUST A COMPRESSION RATIO shown to give a much smoother impression than stream switching by, e.g., WebCast Technologies.10 Verification tests done by MPEG have shown FGS to work well under varying bandwidth conditions, see MPEG.11 MPEG-4 Audio contains similar scalability options. There is large step and small step scalability, and the possibility to change codecs in different layers, e.g., to add a generic audio enhancement layer on top of a speech base layer. The small step scalability allows increments of 1 kbit/sec for extremely fine-granular scalability. The “penalty” for scalable coding in audio is also lower than in video. The notion of scalability has been around for quite some time, including in the MPEG-2 standard, but its use has been fairly limited to date. Rather, service providers have relied on (and technology providers have implemented) intelligent stream switching, notably for internet streaming. Intelligent stream switching entails monitoring the link and switching over to a separately stored lower bitrate signal when the bandwidth drops below a certain threshold, and back up when there is enough room. The problem, especially in video coding, has been that making an encoded signal scalable implies some visible quality loss for the full-quality signal. Coding in the enhancement layers is less efficient than in the base layer. It looks like scalability is now going to be used in the mobile services though, and MPEG-4 fine-granularity scalability may well start being efficient enough to be adopted. Also Simple Scalable profile looks set for implementation; it has been chosen by the 3GPP 12 project for streaming to mobile devices. There are currently studies in MPEG-4 that look into improving the performance of fine-granularity scalability. This may or may not result in standardization, depending on how much progress is made.

Scene Scalability Scene scalability is a different concept than the traditional ways of scalable coding. The MPEG-4 Object Descriptor Framework (OD Framework)1,6,7 allows an author to create different versions of the same scene for use under different circumstances. Object Descriptors tie abstract objects in the scene to realworld data streams. These streams, in turn, are described by Elementary Stream Descriptors. The OD

Framework can be used to alternative representations of an object stream (e.g., a low bitrate and a high bitrate version) and let the decoder decide which is the best it can handle. The decoder can also decide to drop certain objects altogether (or the server can cancel sending them) based on information about the relative importance of the objects, which is signalled by a 5-bit stream priority field in the Elementary Stream Descriptor. Explicit descriptors for the complexity of the scene allow a decoder to make trade-off decisions and to scale back the complexity of the scene to a manageable level.

Efficiency through Continuing Technological Advances Some providers of proprietary coding systems like to state that standards like MPEG-4 (and standards bodies like MPEG) are “too slow for today’s market.” These statements are often self-serving; MPEG sticks to tight schedules and delivers on its promises. It is true that MPEG doesn’t release new standards as often as its proprietary competitors in the streaming media space, and this is for a good reason: to build an interoperable ecosystem of hardware and software tools that are used across markets and verticals, stability is a must. Releasing a new downloadable decoder that can be plugged into a PC framework every six months is one thing, but MPEG cannot do this without destabilizing the value of the standard and alienating significant parts of its constituency. MPEG offers a responsible upgrade path for its standards, with new standards and new profiles offering major advances in efficiency, and several years between one major “release” and the next. The next section will address the benefits of an open ecosystem and serve to dispel the myth that media quality is frozen when a standard is published. The same people that call standards slow like to compare “Standard MPEG-4 quality” to their own products. It needs to be clear that Standard MPEG-4 quality (or MPEG-2 quality for that matter) does not exist. MPEG standards only prescribe how to decode and organize your bits in the bitstream. The guiding principle is to standardize only the minimum needed for interoperability. MPEG standards do not include normative encoding, and anyone can build their own encoder as long as the encoded bits form a syntactically valid stream. There is SMPTE

January 2003 •

Koenen.qxd

12/18/02

11:26 AM

Page 27

MPEG-4, OR WHY EFFICIENCY IS MUCH MORE THAN JUST A COMPRESSION RATIO a lot of room for choices and innovations by individual encoder manufacturers. This spurs competition between encoder manufacturers, who all want to sell their equipment with claims regarding quality. In MPEG-2, the fierce competition between encoder makers has resulted in an enormous drop in bitrate required to code broadcast-quality video. Tandberg Television reports that between the launch of their first commercial MPEG2 encoder in 1995 and now, they have gone from 8 Mbits/sec to 2 Mbits/sec while keeping the same quality. This is an amazing gain in efficiency, achieved entireily without changing the standard and without a need for new decoders. While the gains for MPEG-4 will not be a factor of 4 for the same quality (because many of MPEG-2’s coding concepts apply to MPEG-4 as well, and they have been well researched), there will still be a significant drop in bitrate required to achieve a certain level of quality. Competition brings out the best in MPEG-4 codec builders, and the market stands to benefit. Similar arguments apply to makers of other elements in the ecosystem, such as authoring tools. As well, it must be noted that one size does not fit all. There is room for low-cost software encoders, low-latency encoders for specific purposes, high-quality multipass coders for packaged media distribution, fast hardware-based encoders for use in broadcast systems … etc., etc. These markets will be served by different companies, which will use their resources and ingenuity to come with an optimal solution for a specific field of use. Each field of use will have its dominant players.

Business Efficiencies of an Open Standard There are many advantages of using an open, interoperable standard. First, let’s look at the streaming media field again. When two or three proprietary systems dominate the field, most content providers feel forced to offer content encoded in two or three formats. They don’t want to make the (political) choice for one of them, excluding the others, and risk losing users as a result. This is of course very inefficient. With an open standard like MPEG-4 supported by at least two of the three major players, the choice is no longer difficult—the open international standard is the way to go. (Real and QuickTime are committed to supporting MPEG-4 at the time of writing. Real already operates through a certified plug-in from Envivio and is moving towards native supSMPTE

January 2003 •

port, and QT 6, also with native MPEG-4 support, will have been released by September 2002.)

Risks of the Closed There are many factors that make a bet on a proprietary format a risky one. First, there is the dependence on a single company’s product roadmap. Second, there may be channel conflicts when the vendor of proprietary technologies is a competitor of the buyer, or decides to change its business model to become one. Third, buyers are hostage to pricing plans that are unilaterally determined and changed, and for which no alternative exists. Fourth, by investing in a proprietary format, one may be buying unnecessary and unwanted features, which still come at a price.

Benefits of the Open The benefits of an open standard, on the other hand, are clear. An open standard like MPEG-4 creates an ecosystem in which many companies can play a role. Competition drives the quality of the tools and the efficiency of the compression. Different manufacturers will occupy different parts of the interoperable landscape, all doing what they are best at. Buyers have a choice between different vendors and can purchase different pieces of equipment from different vendors if they like. System integrators can assemble complete, optimized, multivendor solutions. There is always a second source of technology to be tapped when the first vendor changes business model, price or features. The format is not controlled by a single party but through an open process in which any party with an interest can take part. The format is upgraded at larger intervals with major functionality steps. Pricing is controlled through market mechanisms and competition; the same applies to features, and to the “price-per-feature” ratio. There are no monopolies that can pose a risk. Both MPEG-2 and the MP3 format are good examples of how this works in practice. MPEG-2 equipment can be bought from many vendors. After an initial stabilization period, interoperability is guaranteed. (The MPEG-4 Industry Forum is facilitating that process for MPEG-4, with its interop program that has over 30 participants at the moment, including companies such as Apple, RealNetworks, DivX, Networks, Philips, Samsung, IBM, Fraunhofer, PacketVideo, Emblaze, Sorenson, Thomson Multimedia, etc.)

Koenen.qxd

12/18/02

11:26 AM

Page 28

MPEG-4, OR WHY EFFICIENCY IS MUCH MORE THAN JUST A COMPRESSION RATIO

The Advantage of a Cross-platform Standard MPEG-4 has been explicitly designed as a platformindependent toolbox. The kind of convergence that has been predicted for ten years is and will remain a hype. There is not going to be a merger between the TV and the PC, and there is not going to be a single network to carry all multimedia information into the home. Rather, there is a proliferation of devices and networks: there are ever more handhelds, mobile devices, entertainment boxes. There are more rather than less networks and they will all coexist for a long time to come. PSTN lives next to ISDN; 2G, 2.5G, and 3G mobile networks coexist; ADSL and high-speed cable compete; and there will be fiber to the home one day. In such an environment, there are huge gains to be obtained by adopting a common format for multimedia information across these devices and across the diversity in networks. This is more efficient for the provider (who needs to author and encode in less formats), as well as for the consumer (who can more easily transfer content between his devices). The MP3 format provides a great illustration of this point. MP3 (MPEG-1 Layer III Audio) is by no means the most efficient audio codec anymore. There are more efficient proprietary solutions available, and certainly MPEG-2 and MPEG-4 AAC are significantly more efficient. But still, MP3 technology is being built into an increasing number of devices these days: CD players, DVD players, car stereos, etc. This all happens ten years after the format was standardized. The advantages of a well-established cross-platform format by far outweigh the fact that it isn’t the latest and greatest. Standards like MP3 and MPEG-2 stand the test of time very well. Investments in these technologies have proven to keep their value over long periods of time, even in a technology landscape that is supposed to change significantly every few years. And this is perhaps the most important kind of efficiency that MPEG-4 has to offer to its users.

References 1. ISO/IEC 14496-1:2001, Coding of Audio-Visual Objects— Part 1: Systems, 2nd Ed., 2001. 2. ISO/IEC 14496-2:2001, Coding of Audio-Visual Objects— Part 2: Visual, 2nd Ed., 2001. 3. ISO/IEC 14496-3:2001, Coding of Audio-Visual Objects—

Part 3: Audio, 2nd Ed., 2001. 4. R. H. Koenen, “MPEG-4: Multimedia for our Time.” IEEE Spectrum, Vol. 36, No. 2, 1999. 5. R. H. Koenen, T. Pereira, and L. Chiariglione, “MPEG-4: Context and Objectives,” Image Comm. J., Vol. 9, No. 4. 1997. 6. ISO/IEC MPEG (R. Koenen ed.), MPEG-4 Overview, 2002, http://mpeg.tilab.com/standards/mpeg-4/mpeg-4.htm. 7. F. Pereira, F. and T. Ebrahimi (eds.), The MPEG-4 Book, IMSC Press Multimedia Series, 2002. 8. ISO/IEC 14772-1:1997, Virtual Reality Modelling Language, 1997, http://www.web3d.org/technicalinfo/specifications/ vrml97/index.htm. 9. Web3D Consortium, www.web3d.org. 10. WebCast Technologies, www.webcasttech.com. 11. ISO/IEC MPEG, Report on MPEG-4 Visual Fine Granularity Scalability Tools Verification Test, 2002, http://mpeg.telecomitalialab.com/working_documents/mpeg -04/ visual/video_fgs_verification_tests.zip. 12. 3GPP2—3rd Generation Partnership Project 2. www.3gpp2.org.

Rob Koenen joined InterTrust in 2000 and serves as the company’s vice-president of Technology Initiatives. He is responsible for maintaining relationships with standardization bodies as well as strategic technological partnerships. Koenen chaired MPEG’s Requirements Group from 1996 to 2002 and has played a key role in the development of the MPEG-4 standard since its inception in 1993, in defining the MPEG-7 standard since the start in 1995, and leading the work on MPEG-21. He is co-editor of the MPEG-4 Systems Standard. Koenen is the initiator of the MPEG-4 Industry Forum (www.m4if.org), a growing organization that represents more than 100 companies with an interest in seeing MPEG-4 universally adopted. He has served as the president of M4IF since it was established. Koenen received his MSEE (ingenieur) degree in 1989 from Delft University of Technology in The Netherlands, where he studied electrical engineering, specializing in information theory. He holds two patents on automated video quality assessment. First published in the IBC2002 Conference Proceedings of IBC2002, Amsterdam, The Netherlands, Sept. 13-17, 2002. Copyright © International Broadcasting Convention.

SMPTE

January 2003 •