are young male speakers losing tone 3 breathiness in shanghai

be empty (zero onset Ø), a stop (/p, t, b, d/), a fricative (/f, s, v, z/), or the nasal /m/. T1 and T2 can only co-occur with phonologically voiceless obstruents and T3 ...
3MB taille 2 téléchargements 220 vues
ARE YOUNG MALE SPEAKERS LOSING TONE 3 BREATHINESS IN SHANGHAI CHINESE? AN ACOUSTIC AND ELECTROGLOTTOGRAPHIC STUDY Jiayin GAO1, Pierre HALLÉ1,2 1

Laboratoire de Phonétique et Phonologie; 2Laboratoire Mémoire et Cognition [email protected]; [email protected]

ABSTRACT This study examines breathy voice or “muddy airflow” in Shanghai Chinese. We compared low tone T3, where it tends to occur as a redundant feature, to high tone T1 or T2 words, where modal voice is typically found. We recorded speech and electroglottographic (EGG) signals from old vs. young male vs. female speakers of Shanghai Chinese. We measured spectral tilt (H1-H2) from the speech signal and open quotient (OQ) from the EGG signal. We found crossgender and cross-age differences, as well as individual variation. Elderly male speakers use both OQ and H1-H2 differences to distinguish modal and “muddy” voices. Young male speakers use H1-H2 difference for this distinction; only those with a small F0 range use also OQ difference. For female speakers, no OQ or H1-H2 differences were found. The more consistent evidence for breathy tone T3 productions in elderly than young male speakers may suggest a loss in progress of the Shanghai Chinese “muddy airflow”, perhaps under the influence of Mandarin Chinese, which lacks this phonetic feature. Keywords: Shanghai, breathiness, open quotient, EGG.

1. INTRODUCTION About 100 years ago, Karlgren proposed that Shanghai Chinese voiced obstruents are accompanied by “voiced aspiration” [14]. These obstruents were later described as qingyin zhuoliu 清 音 浊 流 , literally ‘clear sound with muddy airflow’ ([17,3]). Chinese linguists use ‘clear’ and ‘muddy’ to describe voiceless and voiced sounds, respectively. Thus, Shanghai Chinese voiced obstruents have been described as voiceless but with a “muddy airflow,” somehow perceived as voiced. What on earth is this “muddy airflow”? Is it “voiced aspiration” as in Hindi? Is it the perceptible “breathiness” found in languages with phonation contrasts? Or should it be called “slack voice” as suggested by [16]? What are the exact articulatory and acoustic descriptions of this “muddy airflow”? Whether phonation variation is contrastive (as in Gujarati or Mazatec [6]) or not (as in English), acoustic and physiological measures serve to identify various phonation types such as breathy, modal, or creaky. Although the difference in amplitude between the 1st and 2nd harmonics in the spectrum (H1-H2) was found to be the most robust measure to distinguish phonation types in 8 out of 10 languages ([6]), several other measures are often needed depending on such factors as speaker’s sex, language, F0, position within the syllable, and so on. These other spectral measures include H1-A1, H1-A2, H1-A3, where H1 is the 1st harmonic amplitude, and A1 to A3 are the amplitudes of the first three formants: the higher the 1st harmonic amplitude relative to that of following components, the breathier the voice quality. Cepstral Peak Prominence

(CPP) is another efficient acoustic measure to distinguish breathy from modal phonation [12,6]. As for physiological measures, the noninvasive electroglottographic (EGG) signal is widely used [5]. Two surface electrodes around the neck measure the degree of contact of the vocal folds during phonation. The open quotient (OQ) measures the proportion of time the vocal folds are separated during each glottal cycle. Higher OQs imply breathier phonation. Some researchers found that OQ correlates positively with H1-H2 ([12]), whereas others only found a weak correlation ([15]). “Muddy” airflow is a property of a subset of syllables with low tones. It developed from a sound change in late Middle Chinese (around 1000 A.D.), in which voiced onsets developed breathiness, and the voicing contrast was reinterpreted as a tone contrast between high and low tones. We do not know what are the exact phonetic properties of breathiness at that time, but today, we can investigate the historical leftovers in several languages that retained breathiness, such as Northern Wu (including Shanghai Chinese) and the Mon-Khmer languages [21]. In Shanghai Chinese, tone and voicing contrasts co-exist. “Muddy” voice does not contrast with modal voice at the phonological level. It is a redundant feature associated to the phonologically voiced onsets, which are phonetically voiceless in low tone syllables in word-initial position. Experimental investigations of Shanghai “muddy” voice during the last 20 years have provided acoustic data for H1-H2 and H1-A1 [2,4,7], as well as physiological data based on fiberoptic transillumination (PGG: [19]), external electrophotoglottography (ePGG: [8]), or on airflow and intraoral pressure [2]. The findings have been somewhat contradictory: some researchers confirmed that “muddy” syllables in word-initial position are produced with higher H1-H2 or H1-A1, with a more widely open glottis than for modal voice, hence with some breathiness ([2,7,19]); other researchers did not find evidence for breathiness ([4,8]). The current study has two goals. Our first goal is to measure phonation in Shanghai Chinese using EGG, which, as far as we know, has never been used to study phonation in this dialect, and to compare the open quotient data from EGG with the H1-H2 data. A second goal of this study is to provide insight on the evolution of the “muddy airflow” in Shanghai Chinese. Among all the Wu dialects, Shanghai Chinese, as spoken in the urban area, has experienced the most rapid change starting in the second half of the 19th century, when Shanghai became a treaty port. Since that time, Shanghai Chinese has been influenced by various migrant dialects, most recently, by Mandarin Chinese, which has had a large impact. Most studies on the evolution of Shanghai phonetics and phonology deal with the tone system or with the inventories of initials and finals, but they rarely investigate changes in acoustic or articulatory properties. In this study, we examine, whether or not Shanghai speakers

T2

ɛ爱

pɛ 板 tɛ 胆

fɛ 反 sɛ 伞

mɛ 美

T3

ɛ咸

bɛ 办 dɛ 谈

vɛ 饭 zɛ 才

mɛ 梅

200 F0 (Hz) 150 100

Tone 1 1

5

Tone 2 2

3

Tone 3 4

5

Position

Position

Figure 2: Averaged F0 curves of Tones T1-3: the 4 male (dotted lines) and 3 female (solid lines) speakers of G2.

350

nasal mɛ 蛮

4

Tone 1

Tone 2

Tone 3

300

fricative fɛ 翻 sɛ 三

Tone 3

3

F0 (Hz)

stop pɛ 杯 tɛ 堆

Tone 2 2

Analyses

200

2.3.

zero ɛ哀

Tone 1 1

Table 1: List of materials according to tones and onsets. T1

250

250

Figure 1: Averaged F0 curves of tones T1-3: the three male (left) and one female (right) speakers of G1.

200

The participants were asked to produce target monosyllabic words in a frame sentence /__ gə ə zɨ ŋo nin tə ə/ (“__” this character, I know it), written in Chinese characters. The target syllable carried one of the two high tones (T1 or T2), or the low tone (T3). Its rime was always /ɛ/. Its onset could be empty (zero onset Ø), a stop (/p, t, b, d/), a fricative (/f, s, v, z/), or the nasal /m/. T1 and T2 can only co-occur with phonologically voiceless obstruents and T3 with phonologically voiced ones (the latter are phonetically voiceless in word-initial position but voiced in medial position.) Zero and nasal onsets may co-occur with all three tones. The 18 target syllables are listed in Table 1.

F0 (Hz)

Speech Materials

Figures 1-2 show the averaged F0 curves for the five consecutive 20% intervals for each tone (T1-3), as computed from the DEGG signal. Figure 1 shows the data of the elderly group (G1), with male and female data in separate panels. Average F0 is about 55 Hz higher for the female than male speakers. Figure 2 shows the data of the young group (G2), with male and female data in the same panel, showing that male and female data did not overlap: mean F0 was 107 Hz higher for female than male speakers. (The F0 computed from the speech signal was very close to that computed from the DEGG signal.)

150

We report the data from 11 native speakers of urban Shanghai Chinese, including 4 elderly speakers (3 males and 1 female, mean age 67.3, range 64-72) and 7 young speakers (4 males and 3 females, mean age 24.9, range 2428). All were born and raised in Shanghai. None reported hearing or reading disorder. All were naive about the purpose of the experiment. Their speech and EGG signals were simultaneously recorded using a Vocevista EGG [20].

F0 from DEGG signals

100

Participants

2.2.

3. RESULTS 3.1.

2. METHODS 2.1.

We obtained the F0 and OQ data for each glottal period in the /ɛ/ vowel of each syllable, then interpolated F0 and OQ values every 5 ms, and finally computed average F0 and OQ for each consecutive fifth of the /ɛ/’s duration. This yielded five F0 and five OQ values for each syllable, which we could compare to the five H1-H2 values.

250

tend to lose the redundant “breathiness” feature of initial low tone syllables, presumably under the influence of Mandarin Chinese, in which this feature is absent. We therefore compare acoustic and EGG data between two age groups: an “elderly” group (mainly influenced by other Wu dialects) and a “young” group (mainly influenced by Mandarin Chinese). We predict some loss of breathiness and greater variability in the young speakers’ productions.

For this study, we decided to use the most widely accepted spectral measure, H1-H2, computed on 30 ms windows for each consecutive fifth of the /ɛ/ rime’s duration, using a Praat [1] script. This yielded five H1-H2 values for each syllable. We consistently used the same non-high vowel /ɛ/ in order to avoid the effect of F1 variation on H1 and H2. The correction proposed by [13] for analyzing multiple vowels was thus unnecessary.

2.3.2. EGG analyses The EGG data were high-pass filtered at 30 Hz to eliminate the effects of gross larynx movements. A semi-automatic Matlab program, “peakdet.m” [10] helped visualize the EGG signal, calculate its derivative, the DEGG signal, and generate the F0 and OQ data from the DEGG signal. (The peaks of the DEGG signal indicate glottal opening and closing time locations). For more details on the use of DEGG, see [9]. In the case of double closing or opening peaks in the DEGG signal, the larger peak was retained.

150

2.3.1. H1-H2

1

2

3

4

5

Position

3.2.

Open quotient

Figure 3 (G1) and 4 (G2) show the averaged OQ curves for the five consecutive 20% intervals for each tone, as computed from DEGG signal. Higher OQ values for T3 than T1 or T2 syllables indicates a breathier phonation. This is observed only for the elderly male speakers: the averaged OQ value for T3 (0.51) is higher than that for T1 (0.47) or T2 (0.46). For the male data, we conducted by-item and bysubject ANOVAs with OQ as the dependent variable, and Tone and Interval number as within factors for each age group. For the elderly male speakers, the by-item ANOVA showed a significant effect of Tone, F(2,10)=7.4, p