Three-dimensional modelling of the speech ... - Antoine Serrurier

sa si su. ∫a. ∫i. ∫u pa pi pu ta ti tu ka ki ku la li lu a i u prephon repos. VS. VL ... Glotte cm Lèvres --> cm. 2. [a]. Area function. 0. 500. 1000. 1500. 2000. 2500.
7MB taille 5 téléchargements 248 vues
Friday 8 December 2006

Three-dimensional modelling of the speech organs from MRI images for nasals production Articulatory-acoustic characterization of the velum movements Antoine Serrurier

M. Gang Feng, M. Philip Hoole, M. Yves Laprie, M. Pierre Badin, M. Shinji Maeda, M. Gérard Bailly,

Président Rapporteur Rapporteur Directeur de thèse Examinateur Examinateur

Plan



Nasality: definition, objectives, literature



Articulatory data



Three-dimensional articulatory model



Articulatory-acoustic modelling



Conclusion and perspectives

2/41

Nasality : definition 

« The term of nasality refers to sounds produced with an enough low velum which allows an audible air flow through the nasal tract. » (Crystal, 1997)

Opening / Closing of velopharyngeal port

[ma]

[pa]

Complex structure of the nasal cavities and sinuses 3/41

Objective : understand nasality Characterization and modelling of the articulatory gesture corresponding to lowering of the velum  Global complex Geometry  Nature and dimensionality of the movements Characterization and modelling of the acoustic consequences Articulatory Synthesis

4/41

Articulatory-acoustic studies : literature Nasal cavities

House and Stevens, 1956 Bjuggren and Fant, 1964 Dang and Honda, 1994

Velopharyngeal port

Demolin et al., 2003

2D geometric model

Mermelstein, 1973

Geometry

Movements

Articulatory-acoustic modelling

Acoustic modelling from area function

House and Stevens, 1956 Fant, 1960, 1985 Mrayati, 1976 Maeda, 1982, 1993 Feng, 1986 Castelli, 1989 5/41

Articulatory model 

3D model : accurate description of the complex geometry

Nasal cavities

Velum



Data based model



Organ shape model





Velum



Nasopharyngeal walls



Nasal cavities

Single subject 6/41

Modelling process 3D images for 46 French representative articulations

• 10 oral vowels • 4 nasal vowels • m, n, f, s, ∫, p, t, k, l,  [a,i,u] • prephonation • rest

Determination of the 46 3D shapes of each organ

Linear analysis

Model for each organ 7/41

MRI images 46 sets of sagittal 3D images to align

Need of 3D reconstruction of the rigid structures Resolution 1 mm/px Soft tissues visibility

No bony structure visibility 8/41

Tomodensitometric images

512 sagittal images

14 12 10

Z

149 axial images Resolution 0.5 mm/px Bony structures visibility

349 coronal images

8 6 4

18

16

14

Y

12

15 10

10 8

5

X

9/41

Rigid structures Jaw Hard palate Sphenoid sinus Maxillary sinuses

Jaw

Hard palate  

Fix reference Help for MRI interpretation

10/41

Pre-processing of MRI images 1. Perpendicular re-slicing of images

[]

2. Superimposition of the rigid structures on the MRI  Alignment in a common reference coordinate system  More accurate MRI interpretation

11/41

Soft structures (1) 3. Manual outlines :  velum  pharyngeal wall Superimposed rigid structures Manual edited soft structures

4. 46 3D shapes of velum and nasopharyngeal wall

[ta] [i] [i]

12/41

Soft structures (2) 4. Generic mesh definition

5. Fitting of the generic mesh towards the 46 articulatory targets

[pa]

12

12.5

11

Velum: 5239 vertices

Z

12 11.5

10

11

Z

9

10.5 10

8 9.5

7 3

12 2

1

0

-1

Y

10 -2

149 8.5

8

X

10

12

14

X

Nasopharyngeal wall: 2110 vertices

-2

0

-1

1

2

Y

× 46 articulations of the corpus

Coherence between this process and 13/41 the real deformations?

EMA recording 

Electromagnetic Articulography : recording of a flesh point position in the midsagittal plan



Dynamic speech corpus : VCV (V=14, C=16) 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 9

10

11

12

13

14

14/41

Coherence meshes - EMA 

Determination of a pseudo-EMA vertex on the generic mesh 12

Z

11

10

9

8 15 7 3

2

1

0

-1

10 -2

X

Y

Position of this point through the mesh deformation process

⇒ Validation of the approach

11 10.8 10.6

Y (cm)



10.4 10.2 10 9.8 9.6 11.4

11.6

11.8

12

12.2

X (cm)

12.4

• EMA recording - EMA PCA • 46 points of the fitted mesh 12.6 12.8 15/41 - Mesh point PCA

Nasal cavities (1) Coronal MRI Resolution 1 mm/px Mucosa visibility

16/41

Nasal cavities (2)

13

13

13

13

13

12

12

12

12

12

11

11

11

11

11

10

10

10

10

10

9

9

9

9

9

8

8 12

14

Grille -1.2 cm

8 12

14

Grille -0.8 cm

8 12

14

8 12

Grille -0.4 cm

14

Grille 0 cm

13

13

13

12

12

12

12

12

11

11

11

11

11

10

10

10

10

9

9

9

8

8

8 12

Grille 1.6 cm

14

12

Grille 2 cm

13

13

12

12

11

11

10

10

9

9

8

3 2

12

14

Grille 0.8 cm

12

13 12

11

11

10

10

10

9

9

9

8

8

14

12

Grille 2.4 cm

14

8 12

Grille 2.8 cm

14

8 12

14

Grille 3.6 cm  Cavum

13

13

13

13

13

12

12

12

12

12

11

11

11

11

11

11

11

10

10

10

10

10

10

10

9

9

9

9

9

9

9

8

8

8

8

8

8

Grille 4.4 cm

12

Grille 4.8 cm

14

12

Grille 5.2 cm

14

12

Grille 5.6 cm

14

13

13

13

13

13

12

12

12

12

12

12

11

11

11

11

11

11

10

10

10

10

10

10

9

9

9

9

9

9

8

8

8

12

14

Grille 6.95 cm

12

14

Grille 7.35 cm

14

Grille 6.4 cm

13

8

10

Aires transverses (CT)

8 12

Grille 6 cm

6

Grille 4 cm

12

14

4 14

LengthGrilles (cm) (cm)Nostrils 

Grille 3.2 cm

13

12

5

≈ 1 cm²

14

2 12

12

14

10

9

0

13

12

15

Grille 1.2 cm

12

0

20 25

≈ 2 cm²

13

1

Aires transverses (IRM)

8

14

Grille 0.4 cm

13

14

4

12

13

12

5

Aires (cm2)

Middle Right Left

12

14

Grille 6.55 cm

 Coherent with Dang et al., 1994 values Grille 7.75 cm Grille 8.15 cm Grille 8.55 cm Grille 8.95 cm  Lower areas than Bjuggren and Fant, 1964  Higher nostril constriction than House and Stevens, 1956 8

12

14

8

12

14

8

12

14

12

14

17/41

Articulatory data : conclusion 

Alignment of different nature data



3D meshes of the rigid structures



3D meshes of the soft tissues fitted to the 46 articulatory targets • Velum and nasopharyngeal wall • Accuracy < 1 mm • Coherent for a statistical analysis



Coherence between 3D static and dynamic EMA data



Nostril constriction ≈ 1 cm²

18/41

Linear modelling [pa]

12

11

Z

46 speech task representative observations: • Velum: 5239 × 3 variables • Nasopharyngeal wall : 2110 × 3 variables

10

9

8 7 3

14 12 2

1

0

-1

10 -2

8

Y

X

Principal Component Analysis

Principal Components  Control parameters of the model Evaluation

• Data variance explanation (%) for each component • RMS reconstruction error (cm) by using the model

19/41

1st parameter : VL PCA on the 3D velum VL: 1st PCA parameter

Var. Ex. (%) (on ± 1.5 cm)

Cum. RMS recons. error (cm)

Velum

83 %

0.08 cm

Pharyngeal wall

47 %

0.07 cm

⇒ 1 dominant parameter

Velum Levator

14 13 12 11 10 9 8

Kent, 1997

7 6

4

6

8

10

12

14

VL = Velum Levator and Passavant’s Pad

20/41

Velum Nasopharyngeal wall Midsagittal plan

1st parameter : VL

21/41

2nd parameter : VS PCA on the 3D velum VS: 2nd PCA parameter

Var. Ex. (%) (on ± 1.5 cm)

Cum. RMS recons. error (cm)

Velum

6 % (cum. 89 %)

0.06 cm

Pharyngeal wall

5 % (cum. 52 %)

0.06 cm

14 13 12 11 10 9 8 7 6

4

6

8

10

12

14

• Small amplitude • Influent on nasal coupling 22/41 area

Velum Nasopharyngeal wall Midsagittal plan

2nd parameter : VS

23/41

VL vs. VS 1.5

1

ki

∫u su

ka fi

lu

∫a u p

0.5

∫i

fu fa a

ti si i ku p i l a s u t i i a y t

0

-

la

VL

u

VL

ø

-0.5

o œ 

a

prephon -1

-1.5

e u 

pa

repos

mi nu 

ma

ã

-2

nasal-2.5-2.5 vowels-2 -1.5 oral vowels nasal consonants oral consonants

œ ni

na

mu 

-1

-0.5

0

VS

0.5

1

1.5

2

2.5

+

VS Nasality

24/41

Coherence 3D articulatory model - EMA • EMA recording - Pseudo-EMA vertex 12

Z

11

10

VL

10.8

9

8 15 7 3

2

1

0

10.6

10

-1

-2

X

Y

- Articulatory model - Pseudo-EMA vertex

10.4

10.2

VS 10

12.5 12 11.5 11

9.8

10.5 10 9.5

11.4

9

11.6

11.8

12

12.2

12.4

12.6

12.8

8.5 8 7.5 9

10

11

12

13

EMA recording

14

⇒ Coherence between the two data type 25/41

Control of the model from EMA data 



Inversion of VL et VS parameters from the Pseudo-EMA vertex

3D RMS error = 0.07 cm

Control of the model from EMA recording

[pppbpm] ⇒ 3D reconstruction from 2D EMA dynamic recording

26/41

Three-dimensional model : conclusion 

Articulatory model with 2 parameters: • VL : Velum (83 %) + Nasopharyngeal wall • VS : Velum (6 %) • Accuracy < 1 mm





Coherence EMA data + Model

2D / 3D correlation : RMS reconstruction error of 3D from 2D < 1 mm 27/41

Acoustic model Hypothesis: F < 5000 Hz  Plane propagation Area function

6

Transverse areas

5

Propagation along centre line

10

[a]

4

cm2

12

3

8

2

6

1

4

0 0

[a]

2 2

4

2

4

6

8



16

X

Electric analogue

11 10 9

80

Transfer function

8 7

Amplitude (dB)

70

[a]

60

6

[a]

5

50 40

4 3 5

10

15

-4

-2

2

0

Y

4

30 20 0

500

1000

1500

2000

2500

3000

Frequence(Hz)

3500

4000

4500

5000

28/41

16

Nasal cavities Fonction d'aire 3.5 17

Droite Gauc he

3

16

15

2.5

Z

2

Aire (cm )

14

13

12

11

2

1.5

1 2

10 0 4

5

0.5

-2 6

7

8

Aires transverses (IRM) 9

10

11

12

13

Y

5

0

Droite Gauche

Aires (cm2)

4 3

0

2

4

LengthLongueur (cm) (cm)

6

8

10

Nostrils 

25

2

Rn1 = 700 Hz

20

1

Rn2 = 2300 Hz Rn3 = 3900 Hz

15 0

2

4

6

Grilles (cm) Aires transverses (CT)

8

10

dB

0

-2

 Cavum

10 5 0 -5

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Hz

■ Nostril constriction ≈ 1 cm² ■ Approximately ¼ wave length resonances ■ Small asymmetry (R3)

Nostril constriction assumption not verified 29/41

Velar region : area function 16

16

16

15

15

15

14

14

14

13

13

13

12

12

12

11

11 2

4

6

10

8

11 2

4

Plan 1

8

10

16

16

15

15

15

14

14

14

13

13

13

12

12

12

11

11

11

10

10

10

4

6

8

2

4

6

8

16

16

15

15

14

14

14

13

13

13

12

12

12

11

11

11

4

6

10

8

2

4

Plan 7

a

8

4

6

8

10

Plan 8

NASAL

2

2

4

6

8

Plan 6

15

2

6

3

Plan 5

16

10

4

Plan 3

16

2

2

Plan 2

Plan 4

velum (model) nasopharyngeal wall (model) tongue (fix)

6

Aire (cm2)

10

nasal oral

1 0 1 2

ORAL

3 4 2

4

Plan 9

6

8

0

0.5



4 3

1

Area (cm²)

Inclusion in a fix oral tract :

NASAL

2

nasal cavities

0

[a, i, u, ,  , œ, ]

-1 -2

cardinal vowels

-3

ORAL

-4 -5 -6

0

2

4

 Glottis

6

8

10

Length (cm)

12

14

16

18

Lips/Nose 

20

22

nasal vowels 30/41

Articulatory limits of nasality Oral configuration

4

4

4

NASAL

3 2

1

1

0

0

-2 -3

1 0

-1

-2

-3

-3 -4

-5

-5

ORAL

-5

ORAL -6

0

5

10

15

cm

DL TRFC = DG

20

-1

-2

-4

-4

NA S A L

2

cm

-1

3

NASAL

2

2

cm2

cm2

3

-6

Nasopharyngeal configuration

Nasal configuration

0

5

10

15

20

cm

DL + DN TRFC = DG

25

-6

ORA L

0

5

10

15

20

cm

DN TRFC = DG

31/41

Extreme articulations : low frequencies Oral vowels

Nasal vowels

200

[i]

250

300

300

350

350

400

400

450

450

F1

F1

250

200

[u]

500

500

550

550

600

600

650

650

[a]

700 750 2500

2000

1500

F2

[]

[]

[œ] []

700

1000

500

750 2500

2000

Oral configuration Nasopharyngeal configuration

⇒ Nasal gap around 450 Hz – 1100 Hz

1500

1000

500

F2

32/41

Acoustic coupling ⇒ Resonances coming from coupling ⇒ Resonances coming from oral formants

un un

180 180

160 160

Amplitude (dB) (dB) Amplitude

140 140

120 120

100 100

80 80

60 60

40 40

20 20 00

500 500

1000 1000

1500 1500

2000 2000

2500 2500

Fréquence Fréquence (Hz) (Hz)

3000 3000

3500 3500

4000 4000

4500 4500

5000 5000

33/41

VL : area function High variations of the velopharyngeal port 5 4.5 4

2 Area (cm²) Aire (cm )

3.5

∆area = 0 cm²  0.8 cm²

3 2.5

∆position > 1 cm

2 1.5 1 0.5 0 -6

-5



 Uvula Length (cm) Cavum 

High oral variations Area Aire(cm²) (cm2)

8

6

4

2

0

0

2



Lips 

16

18

34/4120

VL : acoustic nasopharyngeal 160

[a nasopharyngeal]

• pole o zero

140

Amplitude (dB)

120

100

80

60

40

[a oral] 20

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Fréquence (Hz)

oral ⇒ Variations of the resonances coming from oral formants until 30 % depending of the vowel 35/41

VS : area function High variations of the nasal coupling area 5 4.5

2 Area (cm²) Aire (cm )

4 3.5

∆area = 0.1 cm²  0.6 cm²

3 2.5

∆position = 0.9 cm

2 1.5 1 0.5 0 -6

-5



 Uvula Length (cm) Cavum 

Smaller oral variations a

Area Aire(cm²) (cm2)

8 6 4 2 0

0

2

4

 Glottis

20 36/41

Low frequency nomograms of VL and VS VL and VS variations in the data range around [a] [i] [u] VL

VS

VL total

VS total

200

200

[u]

250 300

300

350

400

450

F1

F1

350

[i]

400

F1

F1

500

450

550

600

600

650

[i]

500

550

650

[a]

[a]

700

700 750

[u]

250

750

2400

2200 2000 1800

o Rn1

1600 1400 1200

1000

800

600

400

2400

2200

2000

• F2 F2

1800

o Rn1

1600

1400

1200

1000

800

600

F2 • F2

Variations of the nasal area function Variations of the oral area function

37/41

400

Simulations - recording comparison EMA recording + synchronous Acoustic

Control of the articulatory model Simulated nomograms

Recorded nomograms

5000

5000

4500

4500

4000

4000

Fréquence (Hz)

Fréquence (Hz)

[]  [a]

3500 3000 2500 2000 1500

3500 3000 2500 2000 1500

1000

1000

500

500

0

0.2

0.4

0.6

0.8

1

1.2

Temps (s)

1.4

1.6

1.8

0

0.2

0.4

0.6

0.8

1

1.2

Temps (s)

1.4

1.6

1.8

38/41

Articulatory-acoustic modelling : conclusion 

Realist Area Functions of the nasal tract • Nasal cavities : ¼ wave length resonances • Velopharyngeal port : VL (± 0.8 cm²) + VS (± 0.5 cm²) • Velar region of the oral tract : VL



Coupling influence on low frequencies • Concentration in the nasal gap around 450 Hz – 1100 Hz • Variations of the poles sensitive to VL and VS





Coherence between the model and the recording in high frequencies 1st nasal formant not obtained in simulation 39/41

Conclusion Articulatory data • Cross use of different nature data • 3D geometry of the nasal cavities and the velopharyngeal port

3D articulatory model of the velopharyngeal port : 2 parameters • VL : dominant parameter (Velum Levator) • VS : lower amplitude parameter • High correlation between 2D movements and 3D shapes

Acoustic Characterization of the articulatory gesture • Realist Area Functions of the nasal tract • Nasal cavities : nostril constriction assumption not verified • Influence of VL and VS on low poles 40/41

Perspectives Acoustics • • • •

3D acoustic propagation Origin of nasal F1? Sinuses ? Comparisons with acoustic recording + EMA Possible external coupling consideration

Articulatory synthesis • • • •

Inclusion in a complete articulatory model Study of the coordination velum / tongue / lips Control of the model from real-time MRI Articulatory synthesis and perceptual tests

• Extension to other subjects • Text-to-Speech articulatory synthesis 41/41

Thank you!