Time-Invariant Context for Sample Rate Conversion systems

Nov 8, 2011 - The impact of a sample rate conversion (SRC) system on a digital signal can be described (cf. ... Personal use of this material is permitted. ... The original z-transform x(z) can be retrieved from the AC-representation by application of the basis ..... are interleaved and summed in order to form a new signal y.
299KB taille 3 téléchargements 202 vues
1

Time-Invariant Context for Sample Rate Conversion systems St´ephan Tassart

Abstract Digital systems dedicated to audio and speech processing usually require sample rate conversion units in order to adapt the sample rate from different signal flows: for instance 8 and 16 kHz for speech, 32 kHz for the broadcast rate, 44.1 kHz for CDs and 48 kHz for studio work. The designer chooses the sample rate conversion (SRC) technology based on objective criteria, such as figures of complexity, development or integration cycle and of course performance characterization. The performances of the fractional SRC system include the in-band and the aliasing characterization plus its distortion behaviour due to internal rounding errors. The paper shows the existence of a new compound time-invariant system made of multiple instances of the same SRC system. The characterization of the original SRC is obtained from the linear and distortion characteristics of this time-invariant system. Regular methods for characterizing time-invariant systems apply. The SRC system can be analyzed in black box conditions, either in batch processing or in real-time processing. Examples illustrate the capability of the method to fully recover characteristics and rounding noise behaviour from actual SRC implementations.

I. I NTRODUCTION The impact of a sample rate conversion (SRC) system on a digital signal can be described (cf. [1]) in terms of modification of the spectrum from the resampled signal, attenuation of the aliased and mirrored spectral images and deviations from the ideal linear model due to the finite word-length representation. A characterization method aims at separating the contribution from those stated impacts, in particular those resulting from the aliasing. A fractional SRC system is an example of a linear multirate system Manuscript received May 12, 2011; revised September 2, 2011. Copyright (c) 2011 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. The author is with ST-Ericsson, 29 bd Romain Rolland, 75669 Paris, France., e-mail: [email protected].

November 8, 2011

DRAFT

2

[2] and more specifically an example of a linear periodically time-varying (LPTV) [3] or block timeinvariant system [4]. Many identification methods have been proposed for LPTV systems, for instance an optimal least-square solution with periodic inputs in a measurement context [1], [5], [6] or in a tracking context [7], [8]. An LPTV system can also be raised into a multiple-input multiple-output (MIMO) linear time-invariant (LTI) system [3] and identified with dedicated MIMO methods [9]. Those identification methods can be adapted to the case of the fractional SRC when they support differing input and output rates. We present in section III a compound system that embeds multiple instances of an SRC system (see Fig. 4) and is LTI. We prove in section III-D that the spectrum of the achieved LTI system and the bispectrum of the LPTV system [1], [5], [6], [10], [11] are equivalent. The advantages of the proposed method are discussed in III-C: the method provides a common framework for comparing the performances of different SRC algorithms, it makes it possible to assess in a black-box approach the performances of proprietary SRC algorithms where internals are not available and to assess the global performance of complicated multi-stage SRC algorithms. Finally, linear time-invariance makes it possible to use a wide range of familiar tools. The separation of rounding errors from the linear characteristics of the SRC is demonstrated in IV-A for an upsampling case. Further examples are presented in IV-B and IV-C to show the achieved performance of the presented framework when a method for measuring the performance of a weakly non-linear timeinvariant single-input single-output (SISO) system [12] is used. Comparisons with related works [1], [5] in terms of computation costs are presented in IV-D. II. I DEAL M ODELS AND N OTATIONS A. Plan Ideal models for decimation, expansion, SRC and LPTV systems are given in section II-D and II-E. As recognized in literature [2], the alias component (AC) representation is useful for describing such multirate systems. The AC-notation and related polyphase properties are introduced in II-B and II-C. B. Notations For any integer N , label ωN a N th root of unity. For any vector x (resp. matrix), we note x> (resp. x∗ ) the transpose (resp. conjugate transpose) vector (resp. matrix). For any scalar z -transform x(z) and

for any integer P and R, we introduce the AC-vector xP : (z) and the AC-matrix XR:P (z) respectively1 1

The notation xP : reminds of the size P × 1 from the AC-component, whereas the notation XR:P reminds of the size R × P .

November 8, 2011

DRAFT

3

as in [10]:

     1  xP : (z) = √   P          1  XR:P (z) = √   R   

x(ωP0 z) x(ωP1 z)

.. . x(ωPP −1 z)

           

0 x> P : (ωR z) 1 x> P : (ωR z)

.. . R−1 x> P : (ωR z)

(1)

           

(2)

The original z -transform x(z) can be retrieved from the AC-representation by application of the basis vector eP which is the P × 1 column vector whose entries are all equal to 0 except for the first entry, which is equal to 1: 1 √ x(z) = e> P xP : (z) P 1 √ x(z) = e> R XR:P (z)eP PR

(3)

We define 1P : the AC-vector from the constant x(z) = 1. It is a P × 1 column vector whose entries are √ all equal to 1/ P . We introduce the P × P diagonal matrix DP (z) describing an array of delays [10]: 

DP (z) = diag 1, z −1 , . . . , z −(P −1)



(4)

and the P × 1 delay vector dP : (z): dP : (z) = DP (z)1P :

(5)

dP : (az) = DP (z)dP : (a)

(6)

Note the additional properties:

−1 d> P : (z)dP : (z ) = 1

dP : (1) = 1P : 



k) The set of vectors dR: (ωR

k=[0,R)

(8)

is orthonormal: j i d∗R: (ωR )dR: (ωR ) = δi−j

November 8, 2011

(7)

(9)

DRAFT

4 ∗ known as the DFT matrix: and forms the R × R unitary matrix WR





R−1 ∗ 0 1 WR = dR: (ωR ), dR: (ωR ), · · · , dR: (ωR )

(10)

The notation from Eq. (2) is used in order to extend the DFT matrix notation for any integer N : WN R:R = 1N : ⊗ WR

(11)

with ⊗ representing the Kronecker product [13]. For conforming matrices, the Kronecker product obeys the following laws: (A ⊗ B)∗ = A∗ ⊗ B∗ (A ⊗ B) (C ⊗ D) = (AC) ⊗ (BD)

(12) (13)

By application of (12) and (13), we obtain the orthogonality of WN R:R : 



∗ > ∗ WN R:R WN R:R = 1N : ⊗ WR (1N : ⊗ WR )





(14)

∗ = 1> N : 1N : ⊗ (WR WR ) = IR

Note the two equivalent notations for WN R:P :  1  N R−1 ∗ 0 1 WN ) dR: (ωR ), dR: (ωR ), · · · , dR: (ωR R:R = √ N 



> 0  dN R: (ωR ) 

    ∗ WN R:R =      

1 d> N R: (ωR )

.. . R−1 d> N R: (ωR )

         

This leads to the additional properties: WP R:P eP = 1P R:

(15)

WP∗ R:P 1P R: = eP

(16)

Lemma 1: If P and R are coprime then the extended DFT matrices WP QR:QR and WP QR:P Q have the following property: WP∗ QR:P Q WP QR:QR = IQ ⊗ eP e> R

Proof: The proof relies on the orthogonality of the continuous set of vectors dP R: (exp(2jπν)). The j i ) and d only possibility for two vectors, respectively dP R: (ωR P R: (ωP ) with i ∈ [0, R) and j ∈ [0, P ),

November 8, 2011

DRAFT

5

not to be orthogonal is that iP + jR = 0 mod (P R). This happens only for i = 0 and j = 0 when P and R are coprime. Therefore: WP∗ R:P WP R:R = eP e> R

The rest of the proof is straightforward. For the rest of the text, we shall adopt the notation P ∧ R for the greatest common denominator of P and R. −k R−k Finally, thanks to the equality ωR = ωR , we obtain the following relationship between WP∗ R:P

and WP>R:P : WP>R:P = JP WP∗ R:P = WP∗ R:P JP R

(17)

where JN is the Hankel permutation N × N matrix whose entries verify: ∀(n, m) ∈ [1, N ]2 ,

JN (n, m) = δn+m−2 + δn+m−(2+N )

C. Polyphase decomposition The R-polyphase type-1 components2 , x k , are associated to a signal x in such a way that interleaving R

those components regenerates the original signal x. The unusual notation

k R

reminds of the fractional

sample delay that separates the polyphase components. In the z -domain, it yields: x(z) =

R−1 X

z −l x l (z R )

l=0

(18)

R

This relationship can be inverted [2] and for any k : ∀k ∈ Z,

x k (z R ) = R

X z k R−1 n ω nk x(ωR z) R n=0 R

(19)

For any integer N , Eq. (19) is expressed in the extended AC-representation: x(z R ) = D−1 N R (z) WN R:R xR: (z)

(20)

where x(z R ) represents the N R × 1 vector whose entries are x k (z R ) for k ∈ [0, N R − 1): R

      1 R x(z ) = √   N   

x0

(z R )

R

x 1 (z R ) R

.. . x N R−1 (z R )

           

(21)

R

2

Polyphase type-2, type-3 and type-4 are alternate definitions for the polyphase decomposition, cf. [10].

November 8, 2011

DRAFT

6

x

Fig. 1.

↑R

R-fold expansion operator model

v

Fig. 2.

u

y

↓P

P -fold decimation operator model

The extended AC-representation from Eq. (20) can be found as: ∗ R WN R:R DN R (z) x(z ) = xR: (z)

(22)

and in particular, the original representation from Eq. (18) can be extended as: x(z) =

R−1 1 NX z −l x l (z R ) R N l=0

(23)

Note that the result is scaled by 1/N . D. Decimation, Expansion and SRC The building blocks for multirate systems are the delay operator, the decimation operator and the expansion operator [2]. Consider the following real (or complex) scalar signals: x, y , u and v . The expansion operator ↑ R associates u with x (see Fig. 1) as follows:

∀n ∈ Z,

u(n) =

    n   x

if n = kR,

   0

else.

R

(24)

The decimation operator ↓ P associates y with v (see Fig. 2) as follows: ∀n ∈ Z,

y(n) = v(nP )

(25)

Provided that z -transforms exist, the result of the R-fold expansion (see Fig. 1 and equation (24)) verifies in the z -domain: u(z) = x(z R )

(26)

The signal y obtained as the P -fold decimation from the signal v (see Fig. 2 and equation (25)) verifies in the z -domain:

−1 1 PX y(z ) = v(ωPm z) P m=0 P

November 8, 2011

(27)

DRAFT

7

x

Fig. 3.

↑R

u

H

v

↓P

y

R/P sample rate conversion model

The multirate system shown in Fig. 3 can be used as an ideal model for a sample rate convertor with a fractional ratio R/P as in [7], with R and P chosen as coprime. This multirate system consists of an R-fold expansion, an LTI system H referred to as the kernel of the multirate system and a P -fold decimation. The signal v is the result of the filtering of u, i.e. v(z) = H(z) · u(z). Thus, from (26) and (27), one obtains the modulation equation for the ideal R/P sample rate conversion system: −1 1 PX y(z ) = H(ωPm z) · x(ωPRm z R ) P m=0 P

(28)

This equation can be written (cf. [10], [14]) in matrix form with the AC-representation (cf. section II-B): yR: (z P ) = HR:P (z) · xP : (z R )

(29)

E. LPTV systems A discrete-time linear system is defined as (L2 , L1 )-LPTV if a shift of the input by L1 samples results in a shift of the output by L2 samples, for any input signal [10]. Given this definition, the ideal R/P sample rate conversion system appears as a (R, P )-LPTV system. As noticed in [4], reciprocally, any (R, P )-LPTV system with R and P chosen as two relatively prime integers verifies equation (28) and

can be characterized by a kernel H . III. LTI

CONTEXT FOR

SRC

SYSTEM

A. Principles The intuition for the LTI methodology originates from a method used for retrieving the interpolation function h(t) from a fractional delay interpolator (cf. [15]). Feeding an impulse into the delay system results in the response h(n/Ts + τ ). Conceptually, the interpolation function h(τ ) can be retrieved by repeating this experiment for every possible delay τ ∈ [0, 1) and interleaving those responses. Following this idea, feeding an infinite set of ideal fractional delayed impulses, (xτ )τ ∈[0,P ) into a fractional R/P SRC system shall generate, after interleaving, a time continuous impulse response which is related to the kernel H of the SRC system. First, we prove in this section that using an infinite set of stimuli is not necessary: a set of P R stimuli is enough, with τ = k/R, for k ∈ [0, P R). We shall label x k each R

November 8, 2011

DRAFT

8

x

R

y

y0

x0 ↓R

P

↑↓ R/P

↑P

xk ↓R

R

yk P

↑↓ R/P

↑P z −1 PR

PR

z

R

↓R

z −1

y P R−1

x P R−1

z

Fig. 4.

1/R

z −1

z

P

↑↓ R/P

↑P

Principles for an LTI context for a fractional SRC system

input stimulus with reference to the applied delay τ = k/R and y k the corresponding response with P

reference to the expected delay, k/P , once the rate conversion R/P is applied. Second, the stimuli are not constrained to be ideal fractional delayed impulses (and therefore infinitely long and non-causal): if the stimuli are band-limited, then, intuitively, the resulting response is correspondingly band-limited. More generally, arbitrary stimuli can be used provided that they are related by some polyphase relationships, as in Eq. (23), and the resulting response will be weighted by x(ejω ), the discrete Fourier transform of x.

B. LTI contexts Consider the compound system shown in Fig. 4. The input signal x is considered as the combination of P R interleaved and shifted channels x k . The SRC operation is applied separately on every channel. R

The resulting y k are interleaved and summed in order to form a new signal y . The left part of the P

diagram corresponds to the extended R-polyphase analysis network implementing Eq. (19) as in [5]. It consists in deinterleaving the input signal x in order to get the first R channels and shifting those first R channels in order to obtain the remaining (P − 1)R channels. Therefore, only the first R channels are

independent signals. The right part of the diagram corresponds to the extended P -polyphase synthesis network implementing Eq. (23). Note in the diagram flow from Fig. 4 that the result is scaled by 1/R as in Eq. (23). The following result applies: Theorem 1 (LTI context for an SRC system): The compound system shown in Fig. 4 where the ideal R/P sample rate conversion system defined by H(z), the z -transform of its kernel, is applied on P R November 8, 2011

DRAFT

9

channels, and whose input x (resp. output y ) is obtained by interleaving and shifting (resp. summing) every channel x k (resp. y k ), is LTI. R

P

When P and R are coprime, y is obtained by filtering x through the kernel of the sample rate conversion system: 1 H(z) · x(z) R

y(z) =

Proof: Form the P R × 1 vector x(z R ) (resp. y(z P )) corresponding the output (resp. input) of the analysis (resp. synthesis) network as defined in Eq. (21):       1 R x(z ) = √   P    

x

0 R

(z R )

x 1 (z R ) R

.. . x P R−1 (z R )

           

R

      1 P y(z ) = √   R   

y

0 P

(z P )

y 1 (z P ) P

.. . y P R−1 (z P )

           

P

Apply Eq. (20) and obtain for

x(z R )

(resp. for

y(z P )):

x(z R ) = D−1 P R (z) WP R:R xR: (z)

(30)

WP∗ R:P DP R (z)y(z P ) = yP : (z)

(31)

The sample rate conversion from Eq. (29) is applied to each channel: ∀k ∈ [0, P R),

yR: k (z P ) = HR:P (z)xP : k (z R ) R

(32)

P

where xP : k (z) (resp. yR: k (z)) is the AC-vector from x k (resp. y k ). Introduce the R × P R matrix P

Y> (z P )

R

(resp. P × P R matrix

R

X> (z R ))

P

so that the former equations are written in matrix form:

Y> (z P ) = HR:P (z)X> (z R )

Examination of matrices Y> (z P ) and X> (z R ) shows that: 







X(z R ) = x(ωPkR z R )

kP P Y(z P ) = y(ωR z )

November 8, 2011

k∈[0,P )

k∈[0,R)

DRAFT

10

Note from the previous relationship: y(z P ) = Y(z P )eR

The evaluation of y(z) can now begin: 1 √ y(z) = e> P yP : (z) P ∗ P = e> P WP R:P DP R (z)y(z ) P = 1> P R: DP R (z)y(z ) P = d> P R: (z)y(z ) P = d> P R: (z)Y(z )eR R > = d> P R: (z)X(z )HR:P (z)eR R While using Eq. (30), evaluate the scalar entries ck of the line-vector d> P R: (z)X(z ): kR R ck = d> P R: (z)x(ωP z ) −1 −1 k k = d> P R: (z)DP R (z)DP R (ωP ) WP R:R xR: (ωP z) −k k = d> P R: (ωP ) WP R:R xR: (ωP z)

This expression ck represents the entry (k, k) of the P × P matrix C defined as: 1 √ C = WP>R:P WP R:R XR:P (z) P

Label diag(C) the line-vector whose entries ck are the main diagonal entries Ck,k from C. The evaluation of y(z) continues:   1 R > √ d> P R: (z)X(z ) = diag WP R:P WP R:R XR:P (z) P   y(z) = diag WP>R:P WP R:R XR:P (z) H> R:P (z) eR P

(33)

Eq. (33) constitutes the partial AC-representation of the block diagram in Fig. 4 for an arbitrary (R, P )LPTV system whose AC-matrix is given by HR:P (z). When P and R are coprime, Lemma 1 allows the simplification of Eq. (33): P ∧R=1⇒ ⇒ ⇒

November 8, 2011

> WP>R:P WP R:R = JP eP e> R = eP eR

  e> P x(z) diag WP>R:P WP R:R XR:P (z) = √ PR 1 R > d> P R: (z)X(z ) = √ eP x(z) R

DRAFT

11

x

x0 r

↓r

y

y0 p

↑↓ R/P

↑p z −1

z xk

PQR = rP

p

↑p z −1

z

r

↓r

z −1

y pR−1

x rP −1

z

Fig. 5.

yk ↑↓ R/P

PQR = pR

r

↓r

p

↑↓ R/P

↑p

Extended LTI context for a fractional SRC system

This completes the first part of the proof: s

R > y(z) = x(z) e> P HR:P (z)eR P (34) 1 y(z) = x(z) H(z) R When P and R are not coprime, P ∧ R = q > 1, Lemma 1 still allows a simplification, although

weaker than Eq. (34): 

WP>R:P WP R:R = JP Iq ⊗ eP/q e> R/q



The compound system of Fig. 4 is still LTI but the obtained transfer function now combines modulation terms H(ωqk z): q−1

y(z) =

kR −k P k R 1 X H(ωR q z)x(ωP q ωR q z) R k=0 q−1

(35)

1 X = H(ωqk z)x(z) R k=0

Note that both Theorem 1, Eq. (34) and the diagram flow in Fig. 4 are scaled by 1/R. This can be simplified as in the flow diagram of Fig. 5. Consider the extended compound system shown in Fig. 5 where Q is a positive integer. Consider r and p such as: r = RQ, November 8, 2011

p = P Q,

R/P = r/p,

P ∧R=1

(36) DRAFT

12

The input and output signals x and y are formed in a similar way as in Theorem 1 with the difference that we form P QR channels and that the extended analysis network (resp. extended synthesis network) is based on a QR-polyphase decomposition (resp. QP -polyphase decomposition) instead. Note that the extended compound system uses r = RQ independent input channels; the remaining (P − 1)r channels are derived from the first r channels by shifting. The following result applies: Theorem 2 (extended LTI context for an SRC system): The compound system shown in Fig. 5 where the ideal R/P sample rate conversion system defined by H(z), the z -transform of its kernel, is applied on P QR channels and whose input x (resp. output y ) is obtained by interleaving and shifting (resp. summing) every channel x k (resp. y k ), is LTI. r

p

When P and R are coprime, y is obtained by filtering x through the expanded kernel of the sample rate conversion system: y(z) = H(z Q ) · x(z)

Proof: The proof follows a reasoning similar to the proof of Theorem 2 with the following substitutions: → p

P

R → r

The sample rate conversion from Eq. (29) is applied for each polyphase component, but instead of resulting in Eq. (32), we obtain the following equation: yr: k (z P ) = HR:P (z)xp: k (z R )

∀k ∈ [0, P QR),

r

(37)

p

Substitute z by z Q in the previous equation and obtain Eq. (38) that replaces Eq. (32): yr: k (z p ) = HR:P (z Q )xp: k (z r )

∀k ∈ [0, P QR),

r

(38)

p

Assume first that P ∧ R = 1. Since p ∧ r = Q > 1, the result from Eq. (35) applies instead (note that the factor 1/R was removed from the flow diagram of Fig. 5): Q−1 1 1 X kQ Q y(z) = H(ωQ z )x(z) R r k=0

1 H(z Q ) x(z) R When P ∧ R = q > 1, we obtain instead the following equation that extends Eq. (35): =

y(z) =

q−1 X

H(ωqk z Q )x(z)

(39)

k=0

November 8, 2011

DRAFT

13

C. Discussion Theorem 1 confirms the intuitive principles set forth in section III-A: feeding a finite set of fractional delayed stimuli into a fractional SRC system, with τ = k/R, reveals the kernel function H . Theorem 2 indicates that splitting the reference stimulus with a smaller fractional interval, 1/r, sub-multiple from 1/R, does not improve the spectral resolution of the kernel function H . Therefore, in most applications,

setting Q = 1 is sufficient. The compound system of Fig. 5 provides a simple methodology for assessing the performance of an SRC system. The kernel H of the SRC system is supposed to have a finite impulse response (FIR) of length T . The typical filter bandwidth is slightly below π/R radians (for speech and audio applications, the bandwidth varies from 0.8π/R to 0.95π/R) and the typical filter length T is about several times R (depending on the stiffness of the lowpass filter) •

Choose one test vector x from a set of one or several test vectors.



De-interleave the test vector x and form the P R channel test vectors x k . Zero padding can prove R

to be useful in order to force the SRC system to process the signal until it has returned at rest. •

Apply the SRC system to each different channel vector and obtain the response channel vector y k . P

The system under study is assumed to be FIR, so the response channel vectors return to 0 after a maximum of dT /Re input zero-padding samples. •

Shift, sum and interleave the P R response channel vectors y k in order to form the response vector P

y. •

Store the response vector y and repeat the process for every available test vector x.

The analysis of the SRC system proceeds as if the test vectors x were processed by a regular LTI system. There are different subcases of interest for the test vectors. Periodic signals are interesting test vectors [5]. Once steady state is achieved (i.e after T samples on the test vector), a window of M = KP R samples corresponding to one period is extracted, stored and analyzed. The period of the channel input vectors is KP (resp. KR for the channel response vectors). Yin and Mehr in [8] use KP -periodic channel input vectors in order to excite and to identify the (P, P )LPTV system. Transposed to the context of characterizing an SRC system, this identification method implicitly turns into a least-square FIR identification method knowing one period from x and observing one period from y . Compared to identification methods [5], [8], the LTI approach relaxes the constraints on M which is not necessarily a multiple of P R. Impulse responses are other interesting test vectors. In such a case, the impulse response h(n) of the

November 8, 2011

DRAFT

14

y ↓r

↑↓ R/P

↑p z −1

z −1 ↑↓ R/P

↑p

z −1

z −1

z −1

z −1 ↓r

↑↓ R/P

PQR = pR

PQR = rP

↓r

↑p

x

Fig. 6.

Causal LTI context for a fractional SRC system

kernel H is directly available in y . When x(z) = 1, channel contributions are all zero except x mR (z) = z m R

for m ∈ [0, P ): PX −1

H(z) = y(z) =

z −mR y mR (z P )

m=0

∀m ∈ [0, P )

P

h(nP + mR) = y mR (n) P

where n varies in the support of y mR . The method is indeed valuable and inexpensive (P channel vectors P

to process instead of P R) for characterizing an ideal SRC system but it misses the effects from internal rounding errors. In order to cover those effects, we assume that with orthogonal input signals (obtained for instance by random phase shifting as in [5]), rounding errors average to zero. Distortion observed on the compound system is exclusively due to rounding errors: this can be observed for instance by feeding a full-scale sine into the compound LTI system [16]. Alternatively, the method developed in [12] for measuring the performance of a weakly nonlinear system can be used. The LTI compound system as shown in Fig. 5 is obviously not causal. In the previous discussion, non-causality was not an issue because we assumed that the analysis proceeded in batch processing. Causality may be required for real-time applications. In such a case, we can use the type-4 polyphase decomposition instead and obtain a causal LTI context for the analysis of an SRC in Fig. 6. The transfer function of the causal LTI compound system becomes: x(z) = z −P QR+1 H(z Q ) · y(z) November 8, 2011

DRAFT

15

Note that the method requires the exact knowledge of the resampling ratio R/P . A separate ad hoc method may be necessary in order to estimate this ratio if it is not explicitly given. Note also that for complex resampling ratios, such as those encountered for resampling 44.1 kHz audio streams at 48 kHz, the amount of channels becomes large: for R = 160 and P = 147, we need to process P R = 23520 channel test vectors. A script automating the processing and the storage of each of those P R audio files is needed.

D. Bispectrum analysis The bispectrum (or bifrequency system function) of an LPTV system is a bivariate function H(ejω1 , ejω2 ) that associates the spectrum y ejω2 of the output signal with the spectrum x ejω1 of the input signal 



(cf. [5]): 

jω2

∀ω2 ∈ R, y e



Z+π



 



H ejω1 , ejω2 x ejω1 dω1

=

(40)

−π

Theorem 3: The bispectrum

H(ejω1 , ejω2 )

of the ideal R/P sample rate conversion system consists

of Dirac lines deriving from H(ejω ), the transfer function of its kernel: 

jω1

H e

jω2

,e



−1  ω1  1 PX = H ej R δ (P ω1 − R (ω2 + 2mπ)) P m=0

where δ corresponds to the Dirac distribution. Proof: The proof is obtained by replacing the expression of the bispectrum in equation (40). The Dirac lines, located at ω1 = R/P (ω2 + 2mπ), simplify the integral expression: 



y ejω2 =

−1  ω2 +2mπ   R  1 PX H ej P x ej P (ω2 +2mπ) P m=0

In order to simplify the notation from the previous expression, introduce ω such as ω2 = P ω : 



y ejP ω =

−1     2mπ 2mπ 1 PX H ej (ω+ P ) x ejR(ω+ P ) P m=0

The modulation equation (28) that characterizes the ideal R/P SRC system can be recognized here, where z is replaced by ejω . This concludes the proof by identification of the bifrequency function. Due to the fact that R and P are relatively prime, the different Dirac lines exhibited by the bispectrum reduce to one single Dirac line shaped by the transfer function of the SRC kernel and continued along the quadrants. Therefore the spectrum analysis of the LTI compound system compares to the bispectrum analysis of the LPTV system as in [1]. Kernel spectrum and bispectrum can both be retrieved in black box conditions.

November 8, 2011

DRAFT

16

xk

yk

2

1

↑2

H0 (0)

ek

z −1 ↑2

H1 (1) ek

Fig. 7.

Rounding errors model in an integer upsampler, on channel k, with R = 2 and P = 1.

x0

y

y0

2

↑↓ 2

1

z −1 x1

y1

2

Fig. 8.

↑↓ 2

1

Integer upsampler embedded in the LTI context with R = 2, P = 1 and Q = 1.

IV. E XAMPLES A. Rounding Errors Model This section illustrates by an example the behaviour of a simple upsampler, with P = 1, in the context of rounding errors. For the purpose of this example, we shall use the polyphase type-1 decomposition of the kernel H of the SRC system: H(z) =

R−1 X

z −l Hl (z R )

(41)

l=0

A simple implementation of the upsampler relies on this polyphase decomposition and is implemented as in Fig. 7. In this example, the result of the signal filtered by the respective polyphase components (Hl )l∈[0,R) is interleaved in order to generate the upsampled signal. This model is a reorganization of the

SRC model of Fig. 3 and corresponds to the equivalent MIMO LTI system from [5] with L1 = P = 1 and (l)

L2 = R = 2. Rounding errors are modeled as an additional stochastic process (ek )l∈[0,R) , independent

from the input [17]. Their introduction in Fig. 7 results for each channel k in: ∀k ∈ [0, R),

y k (z) = H(z)x k (z R ) + ek (z), 1

(42)

R

(l)

where (ek )l∈[0,R) are the polyphase components from ek (z). This equation describes the behaviour of the rounding errors as an additive component.

November 8, 2011

DRAFT

17

Fig. 8 shows the flow diagram that implements the LTI methodology applied to an upsampler with Q = 1. When the model of Fig. 7 applies, each channel verifies Eq. (42) and the output y of the flow

diagram of Fig. 8 verifies: y(z) = H(z)x(z) +

R−1 X

z −k ek (z)

(43)

k=0

= H(z)x(z) + e(z)

This example based on a simple upsampler shows that the residual error, e, observed at the output from the LTI context is a shifted sum of the channel errors ek . When channel errors ek are independent stochastic processes, the residual error e simply averages with the same power density. B. Characterizing in-house 3/2-resamplers Figures 9 and 10 illustrate how the kernel transfer function from two different 3/2 in-house resampler algorithms can be revealed with the LTI methodology. The first algorithm under study is a singlestage SRC system while the second algorithm includes a polynomial interpolation stage. The algorithms are provided with a reference version in floating-point arithmetic and several versions in fixed-point arithmetic. The following fixed-point arithmetics3 are compared: 16-bit, 24-bit and 32-bit. The algorithms are provided as a binary software processing soundfiles in a single-precision floating-point format4 . The method from [12] is used in order to characterize the weakly nonlinear behaviour of those algorithms. The measurements are obtained with L different periodic test signals. Those test signals result from the inverse discrete Fourier transform (IDFT) of a vector describing a flat spectrum and a random phase. The test signals are deinterleaved, processed separately by the software and reinterleaved according to the flow diagram in Fig. 5. When the period, M = KP R, is a multiple of P R, then rounding errors become also periodic5 . From each processed test signal, one single period is extracted once steady state is achieved. The kernel function H(e2jπν ) is estimated as an average over every available vector: L    .   X ˆ e2jπν = 1 H y (i) e2jπν x(i) e2jπν L i=1

3

(44)

The bit-width of a fixed-point arithmetic corresponds to the bit-width of the single-precision fixed-point register. The double-

precision register uses twice as many bits as a single-precision register. 4

Storage in a single-precision floating-point format is lossless for 16-bit and 24-bit fixed-point streams.

5

The period of the rounding errors component is (M ∨ (P R)) × (P R), with M ∨ (P R) denoting the lowest common multiple

from M and P R. This period turns to be M only if M is a multiple of P R.

November 8, 2011

DRAFT

18

Magnitude of the transfer function and residual noise for the 3/2 resampler #1 FLT FLT noise floor FIX24 FIX24 noise floor FIX16 FIX16 noise floor

0 -20 -40

gain (dB)

-60 -80 -100 -120 -140 -160 -180 0

Fig. 9.

1/6 1/4 1/3 normalized frequency (cycle/sample)

1/2

Magnitude of the transfer function of a 3/2 resampler implemented with different types of arithmetic

The value for x(i) e2jπν (resp. y (i) e2jπν ) is obtained as the discrete Fourier transform (DFT) of the 



extracted period. A spectral weight can also be introduced in Eq. (44) as in [16] if the test vectors are not designed with a flat spectrum. The power density function of the residual noise, S e2jπν , is estimated 

as another averaging process: 



Sˆ e2jπν =

1 × LM

L         X (i) 2jπν 2 ˆ 2jπν 2 (i) 2jπν 2 e e · x − H e y i=1

When the test signals are generated from a flat spectrum, the value for |x(i) e2jπν |2 simplifies into one 

constant. ˆ 2jπν ) and the power The quality of the averaging process used to estimate the kernel function H(e ˆ 2jπν ) of the residual noise depends on the amount of test signals. This residual noise density function S(e

is a disturbance resulting from the rounding errors and is reported in [12] to match accurately theoretical results for simple filter structures. The characterization from Figures 9 and 10 is obtained with L = 512 different test signals and with a period equal to M = 256 × 6 samples. The amplitude of the test signals is calibrated in order to prevent clipping (during either the software processing or in the file storage). The estimation of the kernel ˆ 2jπν ) and power density function S(e ˆ 2jπν ) requires 512 × 6 different invocations of transfer function H(e

the algorithm under study. In Fig. 9 and 10, the spectrum of the input signal spans the range ν = [0, 1/6] and the spectrum of the output signal spans the range ν = [0, 1/4]. Fig. 9 first reveals the impact of the quantization on the filter coefficients. Note that this impact can be

November 8, 2011

DRAFT

19

Magnitude of the transfer function and residual noise for the 3/2 resampler #2 FLT FLT noise floor FIX32 FIX32 noise floor FIX24 FIX24 noise floor

0 -20

gain (dB)

-40 -60 -80 -100 -120 -140 0

Fig. 10.

1/6 1/4 1/3 normalized frequency (cycle/sample)

1/2

Magnitude of the transfer function of a 3/2 resampler implemented with different types of arithmetic

precisely predicted from the actual quantized filter coefficients. In this example the 24-bit quantization scheme achieves the same level of accuracy as the reference floating-point kernel transfer function whereas the 16-bit quantization scheme increases the alias rejection level by 3 dB. As expected by the theory [17], the power of the residual noise is 48 dB lower within the 24-bit arithmetic than within the 16-bit arithmetic. On a simple convolution operation (i.e. a direct form FIR filter structure), the residual noise has a flat spectrum. In this example the only source of residual noise is the rounding operation that occurs after the convolution operation applied on each polyphase component. However, Fig. 9 shows that the spectrum of the residual noise of this system exhibits two flat areas, located respectively in the passband domain and in the stopband domain, with an amplitude ratio of 6 dB. The second algorithm implies an integer oversampling stage followed by a polynomial interpolation stage. The impact of the coefficient quantization could be predicted from the actual quantized coefficients for the filter coefficients and for the polynomial coefficients but the interaction between the oversampling stage and the polynomial interpolation stage makes this evaluation difficult. The measurements displayed in Fig. 10 show that both quantization schemes (resp. 24-bit and 32-bit) achieve a level of accuracy comparable to the reference floating-point algorithm. Here, the type of the arithmetic shapes the spectrum envelop of the residual noise. The power of the residual noise generated by the 24-bit version of this algorithm is equivalent to the 16-bit version of the single-stage 3/2 SRC illustrated previously. The power of the residual noise from the 32-bit version is approximately 42 dB lower than that from the 24-bit version: this is lower than the theoretical 48 dB. The 32-bit fixed-point version of the algorithm performs better than the reference floating-point version. This is due to the fixed-point arithmetic that

November 8, 2011

DRAFT

20

TABLE I R ESAMPLING OPTIONS FOR S O X V 14.2.0

Phase

Band-

Quality

Response

width

Rej

-l

low

linear

80%

100 dB

-m

medium

intermediate

95%

100 dB

-h

high

intermediate

95%

125 dB

-v

very high

intermediate

95%

175 dB

resampler 3/4: magnitude kernel and rounding errors 0

rounding errors resampling kernel -l resampling kernel -m resampling kernel -h resampling kernel -v

-20 -40

gain (dB)

-60 -80 -100 -120 -140 -160 -180 0

1/8

1/4

3/8

1/2

normalized frequency (cycles/sample)

Fig. 11.

Magnitude of the estimated kernel function for a 3/4 resampler using SoX

achieves greater accuracy6 in 32-bit than the floating-point arithmetic.

C. Characterizing SoX resampling option Figures 11 and 12 illustrate how the resampling feature from the freeware audio utility SoX (v14.2.0) [18] can be characterized with the LTI methodology with R = 3 and P = 4. The tested options from SoX are listed in Table I. The soundfiles are stored in single-precision floating-point (i.e. 24-bit mantissa). As in case IV-B, the method from [12] is used in order to characterize the behaviour of SoX. In our 6

The accuracy of a signal formatted in floating-point is limited approximately by the size of the mantissa, i.e. 24-bit for

the single precision floating-point format. Furthermore, in a fixed-point arithmetic, the multiplier operand uses two singleprecision operands and returns a double-precision result without rounding whereas the floating-point multiplier operand returns a single-precision value and implicitly rounds the result of the multiplication.

November 8, 2011

DRAFT

21

resampler 3/4: magnitude kernel and rounding errors 0.0001

resampling kernel -l resampling kernel -m resampling kernel -h resampling kernel -v

5e-05

gain (dB)

0

-5e-05

-0.0001

-0.00015 0

Fig. 12.

71.2% 80% normalized frequency (cycles/sample)

92.2% 1/8

Magnitude of the estimated kernel function for a 3/4 resampler using SoX, passband details TABLE II P ERFORMANCE MEASUREMENTS OF S O X

bandwidth

ripples in the passband

Rej -1 dB

-0.01 mB

-l

[-0.01,0.01] mB

105.5 dB

80%

72%

-m

[-0.01,0.01] mB

115.1 dB

95%

80%

-h

[0,0.01] mB

128.3 dB

95%

92%

-v

[0,0.01] mB

166.6 dB

95%

92%

tests, the period of the test signal is set to M = 1024 × 12 samples and the amount of different test signals is set to L = 64. The amplitude of the test signals is calibrated in order to prevent clipping. The spectrum of the input signal spans the range ν = [0, 1/6] and the spectrum of the output signal spans the range ν = [0, 1/8]. Fig. 11 confirms the aliasing rejection performances reported in Table I (the software performs almost always better than reported in the manual). A detailed analysis shows that the bandwidth percentages reported in Table I correspond to a −1 dB cutoff frequency. Fig. 12 and Table II show that the amplitude of the ripples have been configured within SoX to be less than ±0.01 mB in passband. The spectrum of the residual noise displayed in Fig. 11 exhibits two flat areas, as in case IV-B, located respectively in the passband and in the stopband domains. The mean power of the residual noise for both areas is detailed in Table III. The measured differences are not significant and the amplitude ratio November 8, 2011

DRAFT

22

TABLE III M EAN POWER OF THE RESIDUAL NOISE OF S O X

passband

stopband

-l

-166.4 dB

-174.1 dB

-m

-166.4 dB

-173.6 dB

-h

-166.4 dB

-173.6 dB

-v

-166.5 dB

-173.5 dB

between the respective areas is about 7 dB. The estimation of the residual noise is itself prone to numerical difficulties because the estimated residual noise includes the rounding errors from the algorithm under study, the estimation methodology7 and the storage format.

D. Complexity Issues This section compares the respective complexity of known methods for characterizing LPTV systems [1], [5] vs. the characterization method from [12] applied to the LTI context, as used in sections IV-B and IV-C. Those characterization methods require periodic test vectors, with an input period of KP samples and an output period of KR samples. Each algorithm uses L experiments, examines KLR output samples and produces M = KP R spectral bins of the kernel function H . In [1], the algorithm requires at least L ≥ P experiments. It generates bi-frequency spectral density matrices obtained from L DFT of size KP and L DFT of size KR and solves one linear system of size KP × KR in the complex domain. Considerations about the structure of this system simplify the problem: K independent linear systems of size P × R remain. When the characterization method from [12] is applied to the LTI context, the algorithm requires L = SP R different experiments and the spectral bins are obtained as a weighted sum of L DFT of size KP R. The additional amount of experiments to run originates from the specific structure of R(P − 1)

vectors that are shifted versions of the R first references (see Fig. 4). This redundant structure allows the simplification of all linear systems of size P × R. The amount of experiments can be reduced to L = S(P + R) when the first references are chosen in order to cover the set of R(P − 1) shifted vectors. 7

The estimation algorithm is implemented in double-precision floating-point arithmetic.

November 8, 2011

DRAFT

23

The method from [5] is an optimization from the original method in [1] with orthogonal test vectors. The algorithm needs the storage of K intermediary P × R matrices that are processed with different DFT: first K × R DFT of size L (but only P output bins are necessary), then P × R DFT of size K . The final operation of the algorithm is dominated by P × R DFT of size KP . V. C ONCLUSION In this paper, we have discussed the merits of a compound system made of several instances of a sample rate conversion system. This compound system is LTI and its transfer function is directly related to the kernel of the SRC system. Regular identification and characterization methods designed for LTI systems can be applied to this system in order to reveal the linear characteristics responsible for both the in-band and the aliasing behaviours. The weakly nonlinear characteristics due to the propagation of rounding errors are also available from the LTI methodology as a residual noise. The generalization for arbitrary P , R and Q of the summation rule for the residual noise expression, as in Eq. (43), is for further study. The analysis of the power density of the residual noise of a single stage SRC system, first case example from section IV-B, exhibits a spectrum with two flat areas whereas the error model described in section IV-A would predict a channel error with a simple white spectrum. The independence from the channel error signals is therefore questionable. The preliminary study of section IV-D tends to indicate that the LTI context simplifies the analysis method proposed in [1] when the periods from the LPTV systems are relatively prime: it replaces linear systems to solve by additional experiments to run. However the actual algorithm complexity necessary in order to achieve a given variance for the estimation of the kernel function and the residual noise is for further study. The complexity saving achieved by [5] suggests that additional orthogonality constraints may reduce the complexity of the characterization methods based on the LTI methodology but this may increase the power of the rounding errors originating from the estimation algorithm and therefore impair the estimation of the rounding errors from the SRC system. ACKNOWLEDGMENT The author would like to thank the reviewers for the time spent and for their valuable comments. This work was supported by ST-Ericsson.

November 8, 2011

DRAFT

24

R EFERENCES [1] R. Reng and H. Sch¨ußler, “Measurement of aliasing distortions and quantization noise in multirate systems,” in Circuits and Systems, 1992. ISCAS ’92. Proceedings., 1992 IEEE International Symposium on, vol. 5, May 1992, pp. 2328–2331. [2] P. P. Vaidyanathan, Multirate Systems and Filter Banks.

Prentice Hall, 1993.

[3] R. Meyer. and C. Burrus, “A unified analysis of multirate and periodically time-varying digital filters,” Circuits and Systems, IEEE Transactions on, vol. 22, no. 3, pp. 162–168, Mar. 1975. [4] R. Shenoy, “Multirate specifications via alias-component matrices,” Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, vol. 45, no. 3, pp. 314 –320, Mar. 1998. [5] F. Heinle and H. Sch¨ußler, “An enhanced method for measuring the performance of multirate systems,” in In Proc. Int. Conf. on Digital Signal Processing, Limassol, 1995, pp. 182–187. [6] ——, “Measuring the performance of implemented multirate systems,” in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, vol. 5, May 1996, pp. 2754–2757. [7] A. Mehr and T. Chen, “Representations of linear periodically time-varying and multirate systems,” Signal Processing, IEEE Transactions on, vol. 50, no. 9, pp. 2221–2229, Sep. 2002. [8] W. Yin and A. Mehr, “Identification of LPTV systems in the frequency domain,” Digital Signal Processing, vol. 21, no. 1, pp. 25–35, 2011. [9] Y. Dorfan, A. Feuer, and B. Porat, “Modeling and identification of LPTV systems by wavelets,” Signal Process., vol. 84, pp. 1285–1297, August 2004. [10] R. Reng, “Polyphase and modulation descriptions of multirate systems - A systematic approach,” in Proc. Int. Conf. DSP, Limassol, Cyprus, Jun. 1995, pp. 212–217. [11] F. Heinle, R. Reng, and G. Runze, “Symbolic analysis of multirate systems,” Maple Technical Newsletter, vol. 3, no. 1, 1996, special Issue featuring Engineering Applications. [12] H. Sch¨ußler, Y. Dong, and R. Reng, “A new method for measuring the performance of weakly nonlinear systems,” in Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on, vol. 4, May 1989, pp. 2089–2092. [13] C. van Loan, “The ubiquitous Kronecker product,” J. Comput. Appl. Math., vol. 123, pp. 85–100, November 2000. [14] R. Shenoy, “Analysis of multirate components and application to multirate filter design,” in Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on, vol. iii, Apr. 1994, pp. III/121 –III/124 vol.3. [15] R. Schafer and L. Rabiner, “A digital signal processing approach to interpolation,” Proceedings of the IEEE, vol. 61, no. 6, pp. 692 – 702, june 1973. [16] S. Tassart, “Black box methodology for the characterization of sample rate conversion systems,” in 14th International Conference on Digital Audio Effects, DAFx-11, Sep. 2011. [17] S. P. Lipshitz, R. A. Wannamaker, and J. Vanderkooy, “Quantization and dither: A theoretical survey,” J. Audio Eng. Soc, vol. 40, no. 5, pp. 355–375, 1992. [18] “SoX, Sound eXchange,” http://sox.sourceforge.net/.

November 8, 2011

DRAFT

25

St´ephan Tassart was born in France in 1970. He received the Eng. degree in electrical engineering from ´ Ecole Nationale Sup´erieure des T´el´ecommunications (ENST), Paris, in signal processing in 1993, a M.Sc. degree with honors in acoustics, computer science and signal processing applied to music (ATIAM, Paris) in 1994 and a Ph.D. degree in signal processing from Universit´e Pierre et Marie Curie (UPMC Paris-6) in 2000. Since 1998, he has been working in the audio industry (studies, experimentations, expertise) for different companies: Applied Acoustics Systems, Alcatel Mobile Phone, STMicroelectronics and ST-Ericsson. His main research interests are in the field of musical sound synthesis (physical modeling, analog virtual instruments), sample rate conversion techniques and multimodal systems for mobile multimedia applications. Dr. Tassart is a member of the ST-Ericsson Technical Staff.

November 8, 2011

DRAFT