Friday 8 December 2006
Three-dimensional modelling of the speech organs from MRI images for nasals production Articulatory-acoustic characterization of the velum movements Antoine Serrurier
M. Gang Feng, M. Philip Hoole, M. Yves Laprie, M. Pierre Badin, M. Shinji Maeda, M. Gérard Bailly,
Président Rapporteur Rapporteur Directeur de thèse Examinateur Examinateur
Plan
Nasality: definition, objectives, literature
Articulatory data
Three-dimensional articulatory model
Articulatory-acoustic modelling
Conclusion and perspectives
2/41
Nasality : definition
« The term of nasality refers to sounds produced with an enough low velum which allows an audible air flow through the nasal tract. » (Crystal, 1997)
Opening / Closing of velopharyngeal port
[ma]
[pa]
Complex structure of the nasal cavities and sinuses 3/41
Objective : understand nasality Characterization and modelling of the articulatory gesture corresponding to lowering of the velum Global complex Geometry Nature and dimensionality of the movements Characterization and modelling of the acoustic consequences Articulatory Synthesis
4/41
Articulatory-acoustic studies : literature Nasal cavities
House and Stevens, 1956 Bjuggren and Fant, 1964 Dang and Honda, 1994
Velopharyngeal port
Demolin et al., 2003
2D geometric model
Mermelstein, 1973
Geometry
Movements
Articulatory-acoustic modelling
Acoustic modelling from area function
House and Stevens, 1956 Fant, 1960, 1985 Mrayati, 1976 Maeda, 1982, 1993 Feng, 1986 Castelli, 1989 5/41
Articulatory model
3D model : accurate description of the complex geometry
Nasal cavities
Velum
Data based model
Organ shape model
Velum
Nasopharyngeal walls
Nasal cavities
Single subject 6/41
Modelling process 3D images for 46 French representative articulations
• 10 oral vowels • 4 nasal vowels • m, n, f, s, ∫, p, t, k, l, [a,i,u] • prephonation • rest
Determination of the 46 3D shapes of each organ
Linear analysis
Model for each organ 7/41
MRI images 46 sets of sagittal 3D images to align
Need of 3D reconstruction of the rigid structures Resolution 1 mm/px Soft tissues visibility
No bony structure visibility 8/41
Tomodensitometric images
512 sagittal images
14 12 10
Z
149 axial images Resolution 0.5 mm/px Bony structures visibility
349 coronal images
8 6 4
18
16
14
Y
12
15 10
10 8
5
X
9/41
Rigid structures Jaw Hard palate Sphenoid sinus Maxillary sinuses
Jaw
Hard palate
Fix reference Help for MRI interpretation
10/41
Pre-processing of MRI images 1. Perpendicular re-slicing of images
[]
2. Superimposition of the rigid structures on the MRI Alignment in a common reference coordinate system More accurate MRI interpretation
11/41
Soft structures (1) 3. Manual outlines : velum pharyngeal wall Superimposed rigid structures Manual edited soft structures
4. 46 3D shapes of velum and nasopharyngeal wall
[ta] [i] [i]
12/41
Soft structures (2) 4. Generic mesh definition
5. Fitting of the generic mesh towards the 46 articulatory targets
[pa]
12
12.5
11
Velum: 5239 vertices
Z
12 11.5
10
11
Z
9
10.5 10
8 9.5
7 3
12 2
1
0
-1
Y
10 -2
149 8.5
8
X
10
12
14
X
Nasopharyngeal wall: 2110 vertices
-2
0
-1
1
2
Y
× 46 articulations of the corpus
Coherence between this process and 13/41 the real deformations?
EMA recording
Electromagnetic Articulography : recording of a flesh point position in the midsagittal plan
Dynamic speech corpus : VCV (V=14, C=16) 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 9
10
11
12
13
14
14/41
Coherence meshes - EMA
Determination of a pseudo-EMA vertex on the generic mesh 12
Z
11
10
9
8 15 7 3
2
1
0
-1
10 -2
X
Y
Position of this point through the mesh deformation process
⇒ Validation of the approach
11 10.8 10.6
Y (cm)
10.4 10.2 10 9.8 9.6 11.4
11.6
11.8
12
12.2
X (cm)
12.4
• EMA recording - EMA PCA • 46 points of the fitted mesh 12.6 12.8 15/41 - Mesh point PCA
Nasal cavities (1) Coronal MRI Resolution 1 mm/px Mucosa visibility
16/41
Nasal cavities (2)
13
13
13
13
13
12
12
12
12
12
11
11
11
11
11
10
10
10
10
10
9
9
9
9
9
8
8 12
14
Grille -1.2 cm
8 12
14
Grille -0.8 cm
8 12
14
8 12
Grille -0.4 cm
14
Grille 0 cm
13
13
13
12
12
12
12
12
11
11
11
11
11
10
10
10
10
9
9
9
8
8
8 12
Grille 1.6 cm
14
12
Grille 2 cm
13
13
12
12
11
11
10
10
9
9
8
3 2
12
14
Grille 0.8 cm
12
13 12
11
11
10
10
10
9
9
9
8
8
14
12
Grille 2.4 cm
14
8 12
Grille 2.8 cm
14
8 12
14
Grille 3.6 cm Cavum
13
13
13
13
13
12
12
12
12
12
11
11
11
11
11
11
11
10
10
10
10
10
10
10
9
9
9
9
9
9
9
8
8
8
8
8
8
Grille 4.4 cm
12
Grille 4.8 cm
14
12
Grille 5.2 cm
14
12
Grille 5.6 cm
14
13
13
13
13
13
12
12
12
12
12
12
11
11
11
11
11
11
10
10
10
10
10
10
9
9
9
9
9
9
8
8
8
12
14
Grille 6.95 cm
12
14
Grille 7.35 cm
14
Grille 6.4 cm
13
8
10
Aires transverses (CT)
8 12
Grille 6 cm
6
Grille 4 cm
12
14
4 14
LengthGrilles (cm) (cm)Nostrils
Grille 3.2 cm
13
12
5
≈ 1 cm²
14
2 12
12
14
10
9
0
13
12
15
Grille 1.2 cm
12
0
20 25
≈ 2 cm²
13
1
Aires transverses (IRM)
8
14
Grille 0.4 cm
13
14
4
12
13
12
5
Aires (cm2)
Middle Right Left
12
14
Grille 6.55 cm
Coherent with Dang et al., 1994 values Grille 7.75 cm Grille 8.15 cm Grille 8.55 cm Grille 8.95 cm Lower areas than Bjuggren and Fant, 1964 Higher nostril constriction than House and Stevens, 1956 8
12
14
8
12
14
8
12
14
12
14
17/41
Articulatory data : conclusion
Alignment of different nature data
3D meshes of the rigid structures
3D meshes of the soft tissues fitted to the 46 articulatory targets • Velum and nasopharyngeal wall • Accuracy < 1 mm • Coherent for a statistical analysis
Coherence between 3D static and dynamic EMA data
Nostril constriction ≈ 1 cm²
18/41
Linear modelling [pa]
12
11
Z
46 speech task representative observations: • Velum: 5239 × 3 variables • Nasopharyngeal wall : 2110 × 3 variables
10
9
8 7 3
14 12 2
1
0
-1
10 -2
8
Y
X
Principal Component Analysis
Principal Components Control parameters of the model Evaluation
• Data variance explanation (%) for each component • RMS reconstruction error (cm) by using the model
19/41
1st parameter : VL PCA on the 3D velum VL: 1st PCA parameter
Var. Ex. (%) (on ± 1.5 cm)
Cum. RMS recons. error (cm)
Velum
83 %
0.08 cm
Pharyngeal wall
47 %
0.07 cm
⇒ 1 dominant parameter
Velum Levator
14 13 12 11 10 9 8
Kent, 1997
7 6
4
6
8
10
12
14
VL = Velum Levator and Passavant’s Pad
20/41
Velum Nasopharyngeal wall Midsagittal plan
1st parameter : VL
21/41
2nd parameter : VS PCA on the 3D velum VS: 2nd PCA parameter
Var. Ex. (%) (on ± 1.5 cm)
Cum. RMS recons. error (cm)
Velum
6 % (cum. 89 %)
0.06 cm
Pharyngeal wall
5 % (cum. 52 %)
0.06 cm
14 13 12 11 10 9 8 7 6
4
6
8
10
12
14
• Small amplitude • Influent on nasal coupling 22/41 area
Velum Nasopharyngeal wall Midsagittal plan
2nd parameter : VS
23/41
VL vs. VS 1.5
1
ki
∫u su
ka fi
lu
∫a u p
0.5
∫i
fu fa a
ti si i ku p i l a s u t i i a y t
0
-
la
VL
u
VL
ø
-0.5
o œ
a
prephon -1
-1.5
e u
pa
repos
mi nu
ma
ã
-2
nasal-2.5-2.5 vowels-2 -1.5 oral vowels nasal consonants oral consonants
œ ni
na
mu
-1
-0.5
0
VS
0.5
1
1.5
2
2.5
+
VS Nasality
24/41
Coherence 3D articulatory model - EMA • EMA recording - Pseudo-EMA vertex 12
Z
11
10
VL
10.8
9
8 15 7 3
2
1
0
10.6
10
-1
-2
X
Y
- Articulatory model - Pseudo-EMA vertex
10.4
10.2
VS 10
12.5 12 11.5 11
9.8
10.5 10 9.5
11.4
9
11.6
11.8
12
12.2
12.4
12.6
12.8
8.5 8 7.5 9
10
11
12
13
EMA recording
14
⇒ Coherence between the two data type 25/41
Control of the model from EMA data
Inversion of VL et VS parameters from the Pseudo-EMA vertex
3D RMS error = 0.07 cm
Control of the model from EMA recording
[pppbpm] ⇒ 3D reconstruction from 2D EMA dynamic recording
26/41
Three-dimensional model : conclusion
Articulatory model with 2 parameters: • VL : Velum (83 %) + Nasopharyngeal wall • VS : Velum (6 %) • Accuracy < 1 mm
Coherence EMA data + Model
2D / 3D correlation : RMS reconstruction error of 3D from 2D < 1 mm 27/41
Acoustic model Hypothesis: F < 5000 Hz Plane propagation Area function
6
Transverse areas
5
Propagation along centre line
10
[a]
4
cm2
12
3
8
2
6
1
4
0 0
[a]
2 2
4
2
4
6
8
16
X
Electric analogue
11 10 9
80
Transfer function
8 7
Amplitude (dB)
70
[a]
60
6
[a]
5
50 40
4 3 5
10
15
-4
-2
2
0
Y
4
30 20 0
500
1000
1500
2000
2500
3000
Frequence(Hz)
3500
4000
4500
5000
28/41
16
Nasal cavities Fonction d'aire 3.5 17
Droite Gauc he
3
16
15
2.5
Z
2
Aire (cm )
14
13
12
11
2
1.5
1 2
10 0 4
5
0.5
-2 6
7
8
Aires transverses (IRM) 9
10
11
12
13
Y
5
0
Droite Gauche
Aires (cm2)
4 3
0
2
4
LengthLongueur (cm) (cm)
6
8
10
Nostrils
25
2
Rn1 = 700 Hz
20
1
Rn2 = 2300 Hz Rn3 = 3900 Hz
15 0
2
4
6
Grilles (cm) Aires transverses (CT)
8
10
dB
0
-2
Cavum
10 5 0 -5
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Hz
■ Nostril constriction ≈ 1 cm² ■ Approximately ¼ wave length resonances ■ Small asymmetry (R3)
Nostril constriction assumption not verified 29/41
Velar region : area function 16
16
16
15
15
15
14
14
14
13
13
13
12
12
12
11
11 2
4
6
10
8
11 2
4
Plan 1
8
10
16
16
15
15
15
14
14
14
13
13
13
12
12
12
11
11
11
10
10
10
4
6
8
2
4
6
8
16
16
15
15
14
14
14
13
13
13
12
12
12
11
11
11
4
6
10
8
2
4
Plan 7
a
8
4
6
8
10
Plan 8
NASAL
2
2
4
6
8
Plan 6
15
2
6
3
Plan 5
16
10
4
Plan 3
16
2
2
Plan 2
Plan 4
velum (model) nasopharyngeal wall (model) tongue (fix)
6
Aire (cm2)
10
nasal oral
1 0 1 2
ORAL
3 4 2
4
Plan 9
6
8
0
0.5
4 3
1
Area (cm²)
Inclusion in a fix oral tract :
NASAL
2
nasal cavities
0
[a, i, u, , , œ, ]
-1 -2
cardinal vowels
-3
ORAL
-4 -5 -6
0
2
4
Glottis
6
8
10
Length (cm)
12
14
16
18
Lips/Nose
20
22
nasal vowels 30/41
Articulatory limits of nasality Oral configuration
4
4
4
NASAL
3 2
1
1
0
0
-2 -3
1 0
-1
-2
-3
-3 -4
-5
-5
ORAL
-5
ORAL -6
0
5
10
15
cm
DL TRFC = DG
20
-1
-2
-4
-4
NA S A L
2
cm
-1
3
NASAL
2
2
cm2
cm2
3
-6
Nasopharyngeal configuration
Nasal configuration
0
5
10
15
20
cm
DL + DN TRFC = DG
25
-6
ORA L
0
5
10
15
20
cm
DN TRFC = DG
31/41
Extreme articulations : low frequencies Oral vowels
Nasal vowels
200
[i]
250
300
300
350
350
400
400
450
450
F1
F1
250
200
[u]
500
500
550
550
600
600
650
650
[a]
700 750 2500
2000
1500
F2
[]
[]
[œ] []
700
1000
500
750 2500
2000
Oral configuration Nasopharyngeal configuration
⇒ Nasal gap around 450 Hz – 1100 Hz
1500
1000
500
F2
32/41
Acoustic coupling ⇒ Resonances coming from coupling ⇒ Resonances coming from oral formants
un un
180 180
160 160
Amplitude (dB) (dB) Amplitude
140 140
120 120
100 100
80 80
60 60
40 40
20 20 00
500 500
1000 1000
1500 1500
2000 2000
2500 2500
Fréquence Fréquence (Hz) (Hz)
3000 3000
3500 3500
4000 4000
4500 4500
5000 5000
33/41
VL : area function High variations of the velopharyngeal port 5 4.5 4
2 Area (cm²) Aire (cm )
3.5
∆area = 0 cm² 0.8 cm²
3 2.5
∆position > 1 cm
2 1.5 1 0.5 0 -6
-5
Uvula Length (cm) Cavum
High oral variations Area Aire(cm²) (cm2)
8
6
4
2
0
0
2
Lips
16
18
34/4120
VL : acoustic nasopharyngeal 160
[a nasopharyngeal]
• pole o zero
140
Amplitude (dB)
120
100
80
60
40
[a oral] 20
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Fréquence (Hz)
oral ⇒ Variations of the resonances coming from oral formants until 30 % depending of the vowel 35/41
VS : area function High variations of the nasal coupling area 5 4.5
2 Area (cm²) Aire (cm )
4 3.5
∆area = 0.1 cm² 0.6 cm²
3 2.5
∆position = 0.9 cm
2 1.5 1 0.5 0 -6
-5
Uvula Length (cm) Cavum
Smaller oral variations a
Area Aire(cm²) (cm2)
8 6 4 2 0
0
2
4
Glottis
20 36/41
Low frequency nomograms of VL and VS VL and VS variations in the data range around [a] [i] [u] VL
VS
VL total
VS total
200
200
[u]
250 300
300
350
400
450
F1
F1
350
[i]
400
F1
F1
500
450
550
600
600
650
[i]
500
550
650
[a]
[a]
700
700 750
[u]
250
750
2400
2200 2000 1800
o Rn1
1600 1400 1200
1000
800
600
400
2400
2200
2000
• F2 F2
1800
o Rn1
1600
1400
1200
1000
800
600
F2 • F2
Variations of the nasal area function Variations of the oral area function
37/41
400
Simulations - recording comparison EMA recording + synchronous Acoustic
Control of the articulatory model Simulated nomograms
Recorded nomograms
5000
5000
4500
4500
4000
4000
Fréquence (Hz)
Fréquence (Hz)
[] [a]
3500 3000 2500 2000 1500
3500 3000 2500 2000 1500
1000
1000
500
500
0
0.2
0.4
0.6
0.8
1
1.2
Temps (s)
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
Temps (s)
1.4
1.6
1.8
38/41
Articulatory-acoustic modelling : conclusion
Realist Area Functions of the nasal tract • Nasal cavities : ¼ wave length resonances • Velopharyngeal port : VL (± 0.8 cm²) + VS (± 0.5 cm²) • Velar region of the oral tract : VL
Coupling influence on low frequencies • Concentration in the nasal gap around 450 Hz – 1100 Hz • Variations of the poles sensitive to VL and VS
Coherence between the model and the recording in high frequencies 1st nasal formant not obtained in simulation 39/41
Conclusion Articulatory data • Cross use of different nature data • 3D geometry of the nasal cavities and the velopharyngeal port
3D articulatory model of the velopharyngeal port : 2 parameters • VL : dominant parameter (Velum Levator) • VS : lower amplitude parameter • High correlation between 2D movements and 3D shapes
Acoustic Characterization of the articulatory gesture • Realist Area Functions of the nasal tract • Nasal cavities : nostril constriction assumption not verified • Influence of VL and VS on low poles 40/41
Perspectives Acoustics • • • •
3D acoustic propagation Origin of nasal F1? Sinuses ? Comparisons with acoustic recording + EMA Possible external coupling consideration
Articulatory synthesis • • • •
Inclusion in a complete articulatory model Study of the coordination velum / tongue / lips Control of the model from real-time MRI Articulatory synthesis and perceptual tests
• Extension to other subjects • Text-to-Speech articulatory synthesis 41/41
Thank you!