Coding time reduction in image vector quantization by ... - CiteSeerX

This reordering allows coding zero groups with fewer bits. Vector Quantization (VQ) allows lower rates than scalar quantization[Gersho and Gray, 1992]. A.
167KB taille 1 téléchargements 264 vues
Coding time reduction in image vector quantization by linear transforms and partial distorsion evaluation Christophe Foucher, Daniel Le Guennec, Pierre Leray, Gilles Vaucher, Jacques Weiss Sup´elec Equipe Electronique, Traitement du Signal et Neuromim´etisme (ETSN) Avenue de la Boulaie, BP 28, 35511 Cesson-S´evign´e C´edex, France [email protected], [email protected] Franc¸ois Durbin, Andr´e Tissot CEA - DIF, Service Equipement Instrumentation M´etrologie BP 12, 91680 Bruy`eres-le-Chˆatel, France [email protected]

Abstract Scalar quantization is often preferred to vector quantization (VQ) for lossy data compression due to VQ significant encoding complexity. Specifically, the most time consuming task in VQ is the nearest neighbor search, involving numerous distortion calculations. Two main approaches can be considered for VQ complexity reduction. In the first one, complexity reduction is obtained by applying constraints to the codebook. In the second one, nearest neighbor search is speeded up. Some methods from this class evaluate an approximate distortion measure while others use geometrical inequalities to reduce the search space without any approximation. A complete vector quantizer can use either approach, separately or both. In the second approach, we propose a new variant of Partial Distortion Evaluation (PDE), by means of a linear transform. This linear transform concentrates variance on a reduced number of vector components, which are sorted by decreasing variance, resulting in a larger early rejection probability. We tested this algorithm for image compression with Mean-Removed Vector Quantization (MRVQ), which belongs to the first algorithmic approach. A coding time reduction by a factor of up to 4.75 was obtained, compared to classical MRVQ with partial distortion, for a given image quality (transform computing cost is included in this overall coding time reduction).

Keywords: Vector quantization, image compression, complexity.

1 Introduction Digital images, used in multimedia applications and telecommunications need compression for storage or transmission. Lossy compression algorithms generally include three steps: preprocessing, quantization and lossless compression. A linear unitary transform like Discrete Cosine Transform (DCT) or WalshHadamard Transform (WHT) is sometimes applied during the first step. Such transforms are invertible and preserve vector norms. Wavelet transforms may also be used in that stage like in the JPEG 2000 standard. For lossy signal compression they are often used to concentrate signal variance over a reduced number of vector components. For instance, the JPEG standard uses the discrete cosine transform (DCT). In this frequency representation, information is concentrated into low frequency components. Transform components are then quantized independently and those with low variance are set to zero. JPEG also reorders transform components by a zigzag scan through the blocks to maximize the number of successive null quantized components. This reordering allows coding zero groups with fewer bits. Vector Quantization (VQ) allows lower rates than scalar quantization[Gersho and Gray, 1992]. A vector quantizer is an application from Rk to [1; ::; N ℄ which assigns to each k -dimensional vector x an index (x). From now on, the word coding will refer to this operation, image coding means generating a sequence of quantization indices associated to that image. To each quantization index i is associated a reconstruction value yi called code vector. The set fyi ji 2 [1; N ℄g is called a codebook. The coding function is defined by the usual nearest neighbor rule: x is coded by the index i of its nearest neighbor among the code vectors. However, the use of those algorithms is hindered by the complexity of the codebook design and especially of the coding. For example, [Foucher et al., 2000] shows that the codebook design may take from 10 to 100 seconds on a personal computer. The most time consuming step is the Nearest Neighbor Search (NNS). For a given codebook C = fyi j i 2 [1; N ℄g, the code vector yi is the x nearest neighbor if it minimizes the distortion: 8j 2 [1; N ℄ d(x; yi )  d(x; yj ). Therefore, NNS needs computing P N distortion measures to code each vector. The squared error is frequently used for d: d(x; y) = kn=1 (x(n) y(n))2 , where x(n) the nth component of x. Thus, NNS requires kN products, which is expensive because N = 2rk (r being the resolution and k the vector dimension). Many VQ complexity reduction algorithms can be divided into two classes. In the first one, called “constrained vector quantization”[Gersho and Gray, 1992], different constraints are applied to the codebook. Mean-residual vector quantization (MRVQ) and gain-shape vector quantization are examples of product codes. The codebook is constrained to be the cartesian product of several sub-codebooks associated to different features extracted from the vectors. In MRVQ, the features are their P mean component and the variations of component’s values around the mean. We note m(x) = k1 kn=1 x(n) the mean and r (x) = x m(x)  1k the residual vector where 1tk = (1; :::; 1). With MRVQ, a vector x is coded by two indices: a scalar quantization index for the mean and a vector quantization index for the residual. ^ (x)  1k , where A better reconstruction quality is obtained if the residual is computed by: r (x) = x m m ^ (x) is the quantized value of m(x). In that case, r (x) does not have zero mean. In classified vector quantization (CVQ), the codebook is constrained to be the union of different subcodebooks. Tree-structured vector quantization may be viewed as a CVQ. The constraint is maximum in lattice vector quantization where the codebook is a set of regularly spaced vectors. In lattice VQ, NNS involves much less computation and no further coding time reduction is needed. On the other hand, it is only applicable in particular cases[Gersho and Gray, 1992]: large resolution and variable length entropy coding. Algorithms of the second class are described in the next section. These techniques are often called “Fast Vector Quantization”. They are partly independent of any type of codebook constraint and consequently techniques from the two classes can be combined into a complete vector quantizer. Partial Distortion Evaluation (PDE) belongs to the second class. In this paper, we propose an improvement of PDE by linear transforms, that we tested in combination with MRVQ. Next section describes some existing fast VQ algorithms, and we present our contribution in section 3. Section 4 presents our tests and the corresponding results.

2 Existing fast VQ algorithms Some techniques from the second class use an approximation of the distortion measure which needs less calculation. They may lead to a loss of quality. Po and Chan accelerated the vector part of MRVQ by projecting the vectors on a space of reduced dimension[Po and Chan, 1990]. Distortion is computed in this subspace and needs less computation than in the original space. Thus, an approximation of the original distortion is realized, and coding with this distortion can produce a code vector different from the actual nearest neighbor. The choice of projection vectors is important regarding the accuracy of this approximation. Po and Chan suggest to use DCT or WHT[Chan and Po, 1992] as basis functions, since the first transform components contain most of the image information. In [Cheung and Po, 2000], the authors propose a variation of PDE. The minimum distortion is normalized by the number of terms in the partial distortion. Thus, more code vectors are discarded from the search space, resulting in another approximation of the distortion measure. In other techniques of the second class, geometrical inequalities are used to discard candidate code vectors without computing the distortion. Bei and Gray proposed in [Bei and Gray, 1985] a NNS acceleration method called Partial Distortion Evaluation (PDE). During NNS, for each code vector, distortion is computed as a sum of squared terms. With PDE, summation is interrupted as soon as it exceeds the minimum distortion already encountered. Advantages of PDE are its simplicity and its straightforward applicability to every VQ algorithm. Other algorithms[Huang et al., 1992, Baek et al., 1997, Cardinal, 1999] use explicit geometrical inequalities which generally rely on vector-to-scalar mappings. For example, in [Baek et al., 1997], vectors are projected on a particular direction. The scalar mapping is the projection norm and the search space is reduced to vectors whose mapping value is included in an interval. The choice of the mapping may depend on the nature of the vectors. [Skarbek and Ignasiak, 1996] proposed another NNS acceleration mean. It consists in increasing the Partial Distortion Search efficiency through Principal Component Analysis (PCA) or Karhunen-Loeve Transform (KLT) of the vectors set to be coded. The algorithm computes the correlation matrix eigenvectors of this set. Hence, when computing partial distortions following decreasing eigenvalues magnitudes, a minimum number of additive terms is needed to exceed the last minimum distortion reached. [McNames, 2000] developed the same idea, but PCA is computed on the codebook instead of the vectors set to be coded. Consequently, PCA computing cost is less important since it is realized only during codebook design. On the other hand, the transform is no longer an exact KLT but only an approximation. As long as the code vectors are representative of the vectors to be coded, the first PCA basis vectors match the directions of greatest data variance.

3 Our algorithm proposal As mentioned above, Po and Chan showed that distortion computation cost can be reduced by computing an approximate distortion in a reduced dimension space. However, an arbitrary number of projection vectors must be selected. This choice results from a trade-off between coding time reduction and quality loss. We tested the algorithm described by Po and Chan, but their choice of projection vectors caused a severe quality loss. Moreover, the number of needed components for discrimination may change for different code vectors. A code vector very different from x, the vector to be coded, may be rejected with its first projection component only, even though others code vectors close to x will need a more accurate distortion measure and an erroneous selection may occur with such an approximate distortion measure. McNames algorithm[McNames, 2000] does not possess this drawback since it uses partial distortion search, but requires a prior codebook PCA computation that may be fairly costly in the case of variable codebook (adaptive vector quantization). Moreover projection on PCA basis vectors can’t be optimized in the manner of the Fast Fourier Transform for example. In fact, PCA basis vectors don’t exhibit any usable symmetry. Our algorithm is inspired by both previous approaches. Just as Po and Chan do, variance is concentrated with a DCT (Discrete Cosine Transform) or WHT (Walsh-Hadamard Transform). These trans-

forms are data and codebook independent and their computation can be optimized (FFT-like transforms [Gonzalez and Woods, 1993]). Instead of neglecting low variance components, we propose to apply PDE to the transformed vectors. Variance compaction is exploited while search is equivalent to an exhaustive search, with no approximation. This method has the advantage that no arbitrary choice on the number of projection vectors is required. In other words, the number of projection vectors is adaptive. Of course, such a transform would prove inefficient if vectors had no correlated components. But in natural images vector quantization, vectors components are correlated. Like other fast algorithms, this one is quite independent of the type of constraint imposed on the codebook. Just like Po and Chan, we worked on mean-residual vector quantization, which already brings some complexity reduction compared to plain VQ. Mean computation is integrated to transform computation: with WHT or DCT, the first transformed component is the block mean. Unlike for classical MRVQ, only the first component is modified after mean quantization to obtain the residual. Before coding images, two codebooks must be designed, one for the mean: Cm = fui g and one for the residual: Cr = fyi g. A means training set and a residuals training set are extracted from a training image set. The codebooks are obtained through a common design algorithm, such as the splitting algorithm[Gersho and Gray, 1992] (often known as LBG or generalized Lloyd algorithm). The coding procedure of a vector x includes the following steps: 1. transform x into z with f (DCT or WHT) 2. scalar quantize the first component z (0), yielding z^(0) 3. replace z (0) by the residual value z (0)

z^(0)

4. reorder the components of z 5. vector quantize the residual with Partial Distortion Evaluation

x is coded by index m (x) (z^(0) = u m (x) ) of the mean quantization value and by index r (x) of

the residual quantization vector. Components are reordered by variance decreasing order. This order is determined once for all during the codebook design procedure. Training vectors are transformed, and the mean quantizer is designed. Then their mean is quantized and their residuals are extracted. The variance of each component of the residuals is then computed. The variance order is determined and it will be used during the coding. It is important to note that the variance of the residual mean, z (0) z^(0) depends on the scalar quantization. An accurate mean quantization yields a reduced residual mean variance. There is no need for the decoder to compute the inverse transform. It is done once for all with the code vectors of Cr = fyi g. In other words, the decoder’s residual code vectors are defined by: yi0 = f 1 (yi ). Therefore, the decoding steps are: 1. decode the mean, reconstructed by u m (x)

2. decode the residual, reconstructed by y 0 r (x)

0 ^ = y 3. compute x r (x) + u m (x)  1k

4 Experiments PDE does not depend on any specific vector feature, so we can apply it to the vector part of MRVQ. A first experiment proved that our algorithm can reduce plain MRVQ coding time. The following tests were carried out to check that a prior linear transform can improve the overall PDE process.

4.1 Experimental protocol To estimate the coding time we gained over plain MRVQ, we ran our algorithm with three different classical resolutions r : 0.25, 0.5 and 1 bit per pixel (bpp) and two block sizes: 4  4 and 8  8. The PDE algorithm is associated with DCT or WHT, and compared to classical MRVQ accelerated by PDE only, without transform. For each test run, different combinations (mean resolution rm , residual resolution rr )

are set with the relation: (rm + rr )=k 2rm values and 2rr code vectors.

=

r. Thus the mean and residual codebooks contain respectively

The codebooks are designed with 18 grayscale training images. The well-known image Lena (512  pixels), not included in the training set, is the test image. The pictures are initially coded with 8 bits per pixel. The reconstructed image quality is measured by the Peak-Signal-to-Noise Ratio:

512

P SNR = 10 log

 2552  MSE

where MSE stands for “Mean Square Error”. The coding time is measured, on a personal computer equipped with a Pentium II 300MHz processor. 29

28.5

28

mean resolution : 1 residual resolution : 7

PSNR 27.5 (dB)

mean resolution : 7 residual resolution : 1

27

26.5 MRVQ + DCT + PDE MRVQ + WHT + PDE MRVQ + PDE 26

0.1

Figure 1:

4

coding time(s)

1

 4 blocks MRVQ with resolution 0.5 bpp

4.2 Results In the first test, the block size is 4  4 pixels and r = 0:5. The algorithm is tested with integer values for rm ranging from 1 to 7. Figure 1 shows PSNR values and coding time obtained for each algorithm and each combination (rm , rr ). The time scale is logarithmic. For a given resolution couple (rm , rr ), reconstruction PSNR is similar for all three algorithms. With fine mean quantization and coarse residual quantization, MRVQ+PDE requires less computation time. In fact, there are few code vectors for the residual and the vector part of the coding is not very time consuming. On the contrary, when more bits are allocated to the residuals, using a transform and PDE becomes worthwhile. In all cases, WHT is more efficient than DCT. WHT only uses additions while DCT also uses a few products. This difference may explain the observed computing time difference. To make a comparison, we implemented the algorithm described in[Baek et al., 1997]. Like in that paper, we applied plain VQ instead of MRVQ. With the same block size, same training set and same test image, and a resolution of 8 bpp, we obtained a coding time of 0.33 seconds and a PSNR of 28.90dB. Figure 1 shows a slightly lower PSNR for this coding time. So, Baek’s algorithm is better if a shorter coding time is unneeded by the application. On the contrary, if a reduced coding time is of prime importance, our algorithm is preferable. Moreover, the coding time/quality tradeoff cannot be modulated with plain VQ. To extend the comparison, one could try to speed up MRVQ with Baek’s method. But its application to MRVQ is not quite straightforward because it makes use of vectors mean component variance. Another direction of projection could be more appropriate, depending on the number of bits allocated to the mean quantization. For the second test, the block size is 4  4 and the total resolution r is 1 bpp. The mean is quantized with 4 to 9 bits. Unlike the original pixel values, the mean is a real number and it is not unreasonable to code it with more bits than the pixel values. rr ranges from 12 to 7. Figure 2 shows the obtained results.

33 32.8 32.6 32.4 32.2 PSNR 32 (dB) 31.8

mean resolution : 4 residual resolution : 12

mean resolution : 9 residual resolution : 7

31.6 31.4

MRVQ + DCT + PDE MRVQ + WHT + PDE MRVQ + PDE

31.2 31

1

Figure 2:

4

coding time(s)

10

 4 blocks MRVQ with resolution 1 bpp

At this resolution level, our acceleration algorithm significantly reduces coding time, compared to MRVQ+PDE with all (rm ; rr ) couples. Moreover, DCT, while requiring more computations than WHT, allows lower coding times than WHT because of DCT better variance compaction property. In fact, in this test, even with PDE, the computing time for transforms is negligible compared to NNS. 28.4 28.2 28 27.8 PSNR 27.6 (dB) 27.4

mean resolution : 4 residual resolution : 12 mean resolution : 9 residual resolution : 7

27.2 27

MRVQ + DCT + PDE MRVQ + WHT + PDE MRVQ + PDE

26.8 0.1

1

Figure 3:

8

coding time(s)

10

 8 blocks MRVQ with resolution 0.25 bpp

Finally, a third test applies MRVQ to 8  8 blocks, yielding a low resolution of 0.25 bit per pixel, with the same mean and variance resolutions as in the preceding test. Our results are presented on figure 3. Again, the combination (transform+Partial Distortion Evaluation) yields a substantial coding time gain. This gain is greater than in the preceding tests. For example, with rm = 5 and rr = 11, MRVQ+DCT+PDE gave a coding time 4.75 times lower than with MRVQ+PDE. In the preceding test, the ratio was only 1.60. Fig. 4 shows reconstruction examples from the three tests with CPU time, compression rate and distortion measures.

5 Conclusion In this paper, we proposed an acceleration algorithm for the nearest neighbor search in image vector quantization. This algorithm combines a linear unitary transform, WHT or DCT, and Partial Distortion Evaluation. The linear transform concentrates the information contained in the vectors over a reduced number of components. These components are sorted by order of decreasing variance. Efficiency of Partial Distortion Evaluation is thus improved. In most cases, we observed that our algorithm reduced coding time for a given image quality. At a low residual resolution, due to its small code vectors number and the absence of transform, plain MRVQ has a smaller coding time (cf Fig. 1). Starting from a residual resolution of about 4 bits per vector, the transform computing cost becomes negligible compared to residual vectors coding time improvement. This algorithm might be improved by sorting the code vectors according to their first component magnitude, as done by Baek[Baek et al., 1997]. It can be used with different types of constrained codebook VQ. We tried it with MRVQ because it allows lower coding rates than plain VQ. To complete this study, it might be interesting to apply this approach to other VQ algorithms, bearing in mind that gain depends on the VQ algorithm complexity. In binary tree search VQ, for example, each stage involves only two codewords and a further gain in coding time is less probable. The tests were carried out with natural images. The linear transform may be less efficient in representing image information for other types of images, like error images in video compression. It should be interesting in the future to make sure that all transform components are really necessary. If they are not, we could eliminate some of these unnecessary components just as Po and Chan did, while still using PDE with the remaining components.

References [Baek et al., 1997] Baek, S., Jeon, B., and Sung, K.-M. (1997). A fast encoding algorithm for vector quantization. IEEE Signal Processing Letters, 4(12):325–327. [Bei and Gray, 1985] Bei, C.-D. and Gray, R. M. (1985). An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Transactions on Communications, COM33(10):1132–1133. [Cardinal, 1999] Cardinal, J. (1999). A fast full search equivalent for mean-shape-gain vector quantizers. In 20th Symposium on Information Theory in the Benelux, pages 39–46. [Chan and Po, 1992] Chan, C.-K. and Po, L.-M. (1992). A complexity reduction technique for image vector quantization. IEEE Transactions on Image Processing, 1(3):312–321. [Cheung and Po, 2000] Cheung, C.-K. and Po, L.-M. (2000). Normalized partial distortion search algorithm for block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 10(2):417–422. [Foucher et al., 2000] Foucher, C., Le Guennec, D., Leray, P., and Vaucher, G. (2000). Algorithmes neuronaux et non neuronaux de construction de dictionnaire pour la quantification vectorielle en traitement d’images. In Journ´ees Neurosciences et Sciences de l’Ing´enieur (NSI), pages 165–168. [Gersho and Gray, 1992] Gersho, A. and Gray, R. M. (1992). Vector quantization and signal compression. Kluwer Academic. [Gonzalez and Woods, 1993] Gonzalez, R. C. and Woods, R. E. (1993). Digital Image Processing. Addison-Wesley. [Huang et al., 1992] Huang, C.-M., Bi, Q., Stiles, G. S., and Harris, R. W. (1992). Fast full search equivalent encoding algorithms for image compression using vector quantization. IEEE Transactions on Image Processing, 1(3):413–416.

[McNames, 2000] McNames, J. (2000). Rotated partial distance search for faster vector quantization encoding. IEEE Signal Processing Letters, 7(9). [Po and Chan, 1990] Po, L.-M. and Chan, C.-K. (1990). Novel subspace distortion measurement for efficient implementation of image vector quantiser. Electronics Letters, 26(7):480–482. [Skarbek and Ignasiak, 1996] Skarbek, W. and Ignasiak, K. (1996). Fast VQ codebook search in KLT space. Neural Network World, 6(3):383–386.

a: original

T

 4, rm = 7, rr = 9 : Q = 32:2, E = 12:7 CR1 = 8, CR2 = 10:1

c:

4

= 2 26,

b:

4

 4, rm = 4, rr = 4

T

= 0 12,

: Q = 28:5, E = 6:76 CR1 = 16, CR2 = 18:9

T

= 0 74,

 8, rm = 7, rr = 9 : Q = 27:7, E = 12:9 CR1 = 32, CR2 = 39:7

d:

8

T is the coding time in seconds, Q is the PSNR in dB, and E is the entropy in bits per vector, CR1 is the compression ratio without entropy coding, CR2 is the compression ratio with an ideal entropy coder Figure 4: reconstructed pictures coded with MRVQ & DCT