Fremdsprachen Lehren und Lernen
flul
0932-6936
2941-0797
Narr Verlag Tübingen
Es handelt sich um einen Open-Access-Artikel, der unter den Bedingungen der Lizenz CC by 4.0 veröffentlicht wurde.http://creativecommons.org/licenses/by/4.0/121
2003
321
Gnutzmann Küster SchrammProsody in second language speech production: the role of the native language
121
2003
Ulrike Gut
This paper is concerned with prosodic aspects of the speech production of second language learners. In particular, rhythmical properties of L2 speech will be investigated that include temporal and metrical aspects of utterances. In the study, the speech of 14 learners of German with prosodically typologically different native languages will be analysed and compared to that of native speakers of German. Phonetic differences in their temporal and metrical organization of speech can be traced back to native language influence. However, this effect is more pronounced in less advanced learners. Implications for a model of second language speech production are discussed.
flul3210133
Ulrike Gut* Prosody in second language speech production: the role of the native language ** Abstract. This paper is concemed with prosodic aspects of the speech production of second language learners. In particular, rhythmical properties of L2 speech will be investigated that include temporal and metrical aspects of utterances. In the study, the speech of 14 learners of German with prosodically typologically different native languages will be analysed and compared to that of native speakers of German. Phonetic differences in their temporal and metrical organization of speech can be traced back to native language influence. However, this effect is more pronounced in less advanced learners. Implications for a model of second language speech production are discussed. 1. Prosody in Second Language Speech This study is concemed with the production of prosodic properties in the speech of learners of a second language. lt focusses on the production of temporal and metrical features of second language speech. In connected speech, sequences of words are grouped into prosodically structured units with a particular temporal and metrical shape. The temporal shape of an utterance refers to the relative duration of units such as syllables, the metrical shape is a result of their relative prominence. In German, for example, utterance (1) will be produced as a sequence of syllables with different durations and prominence. (1) Willst Du ein Stück Kuchen? (Would you like a piece of cake) A possible production of (1) is that the syllables willst and Ku are more prominent (printed in bold) than the syllables Du, ein and Stück. Moreover, the syllable Stück is probably more prominent than the syllables Du and ein. Figure 1 (on the following page) illustrates a production ofthis sentence with the pitch contour (in Hertz) and the intensity curve (in dB). The sentence is realized as (1') [v1lsduarnftvkuxan] Korrespondenzadresse: Dr. Ulrike GUT, wiss. Assistentin, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Postfach 100 131, 33501 BIELEFELD. E-mail: gut@spectrum.uni-bielefeld.de Areas ofwork: Prosody, Second and first language acquisition. ** This research is funded by the MSWF (Ministry for Education of North-Rhine Westphalia). I am very grateful to Dafydd Gibbon, Sarah Johanning, Jan-Torsten Milde, Annette Nick, Birte Schaller, Alexandra Thies and Thorsten Trippe! for their help. lFlLlUllL 32 (2003) 134 Ulrike Gut with high intensity on [VIis], [am] and [ku] and high pitch on the syllable [ku].Thus, in this example, relative prominence is correlated with one or more of the acoustic features high pitch and high intensity. vlls dU aln StY ku x@n 0 1.57545 Time (s) 5Üv N ~ ~ ~ ~ ~ ~ .s: : B C: 00 1.57545 Time(s) 77.0 iii' : ! : ! , ~ U) C .l! l -= 41.7 0 1.57545 Time (s) Fig.1: Waveform, pitch and intensity ofthe utterance [vrlsduarnJtvkuxan] lFlLd 32 (2003) Prosody in second language speech production: the role of the native language 135 In German, prominent syllables are usually longer and have higher pitch and loudness than less prominent ones. In non-prominent syllables, only reduced, i.e. shortened and centralized vowels occur. In connected speech, prominent and non-prominent syllables occur in more or less regular alternation, and their distribution together with prosodic phrasing is often referred to as speech rhythm. The rhythmic planning of utterances is part of the phonological encoding in speech production. LEVELT (1989) proposes a Prosody Generator, which forms part ofthe phonological encoding and which has as one of its major tasks the generation of speech rhythm. lt receives various kinds of input: segmental input, i.e. the string of speech sounds specified by the lexicon; attitudinal and emotional information which affect the planning of the "intonational meaning"; surface phrase-structural and pitch accent information, which is required for the planning of prosodic phrases; and metrical information about potential accents on syllables. The last two types of input contribute to the rhythmic properties of the resulting phonetic plan. Tue Prosody Generator produces as output syllable frames with prosodic parametrization, which specifies their duration, loudness and contribution to the pitch contour. Thus, each syllable frame is characterised by its contribution to the rhythmic pattern of the utterance: whether it will be long, loud and high pitched and thus be perceived as prominent or not. One of the central questions in bilingual speech production is that of the representational status of the various components and modules of the speech production process. Does a speaker of two languages have a shared system which stores linguistic information about the two languages or does a bilingual speaker have separate systems for each language? In his adaptation of Levelt's model to a bilingual speaker, DE BOT (1992) suggests that phonological encoding is language-specific, but that all phonetic plans for syllables from both languages of a speaker are stored together. Equally, although he assumes the Prosody Generator to receive linguistic-specific input, he proposes that a bilingual speaker draws prosodic plans from a single system. Articulation, finally, is also thought to proceed from a shared system which stores the possible sounds and prosodic patterns of both languages. The "foreign accent" of second language learners' speech is assumed to reflect the interaction between the shared representations of syllable plans. Many studies have put forward evidence for an interaction of the two phonological systems of a bilingual speaker. CUTLER [et al.] (1992) found that French/ English bilinguals who grew up with both languages partly employ different speech perception strategies than monolinguals. WATSON (1991) studied the production of certain phonemes by French/ English bilingual speakers and found that their articulation may differ from that of monolinguals. Prosodie interaction of two language systems in terms of word stress pattems was found for Spanish learners ofEnglish (MAIRS 1989), English learners of German (KALTENBACHER 1998) and Hungarian, Polish and Chinese learners of English (ARCHIBALD 1994, 1997). The phonetic production of stress by German learners of English shows influences of both languages (GUT 2002), as do the intonational patterns of German-speaking learners of English (GROSSER 1989, JILKA 2002). However, the quality of L2 phonology and the extent to which the two phonological systems of a bilingual speaker influence each other depend on factors such as length and lFJLd 32 (2003) 136 Ulrike Gut quality of exposure to the target language, practice and perceptual abilities. Some studies have shown that mixed systems are more likely in beginning than in advanced language learners. WENK (1985) investigated the reduction of unaccented syllables in the English speech of French learners and found that beginners showed native language influence whereas this was not detectable any more in advanced learners. Another factor that has been suggested to determine the degree of interaction between the two phonological systems of a bilingual speaker is the structural difference between the two languages. For bilingual first language acquisition, TRACY (1996) argues that an initial fused system is more likely when the two languages show many structural similarities. A model of speech learning based on linguistic similarity is formulated by FLEGE (1995), who predicts that the degree of closeness between native and target language sounds determines the ease with which they can be learned. Sounds of an L2 can be categorized as either the same, similar to sounds of the Ll or completely new. Sounds which are the same in both languages do not pose any difficulties and L1 phonetic plans will be employed in L2 speech production. For sounds that are completely new, learners will develop new articulation categories (e.g. BoHNIFLEGE 1997), but newly established categories will not always correspond with native speaker categories. The most difficult sounds are the similar ones which require the creation of a new category. Many learners will use Ll categories in L2 speech production instead of establishing new categories for these sounds. Applied to prosody, this model thus predicts that L2 speakers with a native language closely related to the target language in temporal and metrical patterns will show more interaction between their phonological systems than speakers with a prosodically more remote native language. Learners with native languages that do not provide them with prosodic patterns required for the prosody of the target language will develop a new category whereas learners with a native language that uses similar categories will produce those LI categories. An analysis of the prosodic systems of native language and target language of second language learners may thus help to predict the type of speech processing and the quality of the speech of second language learners. Since this study is concerned with temporal and metrical aspects of speech, typological differences between languages in this area and measurements of their phonetic correlates will be discussed in the following. 2. Temporal and metrical aspects of speech across languages The languages of the world have traditionally been divided into stress-timed and syllabletimed (PIKE 1945, ABERCROMBIE 1967). In stress-timed languages, prominent syllables ("stress beats") are supposed to occur at fairly regular intervals. The prosodic unit stretching from one stress beat up to the next including the intervening unaccented syllables, is called a foot. Utterance (2) consists of four feet, the first foot including the accented syllable ti (printed in hold) and the unaccented syllables ger and and, the second the accented syllable mause and the unaccented syllable are. The third foot comprises the flLIIL 32 (2003) Prosody in second language speech production: the role of the native language 137 accented syllable wal and the unaccented syllables king, in and a, whereas the last foot contains only the accented syllabiefield. (2) Tiger and mouse are walking in a field. [ta1garandmausawokir1mafild] In early conceptions of stress-timed speech rhythm it was assumed that the time interval between two stress beats is isochronous, i.e. roughly equal in time. Since the number of syllables between two stress beats varies, their length is adjusted to fit into the stress interval. This temporal adjustment is reflected in the reduction of vowels. The three unaccented syllables in the third foot of (2) are assumed to be spoken faster than the single unaccented syllable in the second foot of (2). Hence, syllable length is supposed to be very variable in stress-timed languages. English and German are often cited as typical stress-timed languages (ABERCROMBIE 1967). However, many researchers have tried and failed to find an acoustic basis for these claims. The interstress interval in stress-timed Ianguages is notof equallength (CLASSE 1939, O'CONNOR 1965, ULDALL 1971, HILL [et al.] 1979, ROACH 1982, DAUER 1983). In syllable-timed languages, syllables are assumed to be spaced out evenly and have similar length. Prominent syllables occur at irregular intervals. Sentence (3) illustrates this for Italian. (3) Ecco la casa del suo padre. [ekolakazadelsuopad re] All syllables of this utterance are assumed to be roughly identical in length, vowel reduction does not occur. The Romance languages French, Italian and Romanian are often cited as typical syllable-timed languages (ROSSI 1998, DASCALU-JINGA 1998). Again, acoustic evidence for syllable-timing is difficult to obtain: ROACH (1982), in an overall statistical comparison of syllable length in stress-timed and syllable-timed languages, did not find significant differences between the two groups. GUT [et al.] (2002), however, found significant differences between the temporal organization of English and that of the three West African languages Anyi, Ega and Ibibio, all with putative syllable-timed rhythm. They measured the relationship between subsequent syllables in sentences with the Rhythm Ratio (RR). The Rhythm Ratio (GIBBON/ GUT 2001) is based on the following formula: where d stands for the duration of a syllable and m for the total number of syllables. '½=dk and di=dk+J applies if di is smaller than di and di=dk and di=ctk+J if di is not smaller than di. In other words, for each pair of adjacent syllables, the shorter is divided by the longer. The average of all these ratios is calculated and multiplied by 100. Thus, if the RR equals 100, subsequent syllables have exactly the same duration. The lower the degree of JF[,l! ]L 32 (2003) 138 Ulrike Gut similarity the lower the RR value. The RR for the West African languages was significantly higher than that of British English. In recent approaches speech rhythm is measured on the segmental level, i.e. the level of the individual speech sounds. This is based on observations by DAUER (1983), who suggested that "rhythmic differences [... ] between languages [... ] are more a result of phonological, phonetic, lexical, and syntactic facts about that language than any attempt on the part of the speaker to equalize interstress or intersyllable intervals" (p. 55). In Dauer's view, speech rhythm reflects variety of syllable structures, phonological vowel length distinctions, absence/ presence of vowel reduction and lexical stress. Whereas languages classified as stress-timed such as English show a variety of different syllable structures consisting ofa variable number of consonants (C) before and after the vowel (V) (e.g. for English: CV (30% frequency), CVC (34%), VC (15%), V (8%), CVCC (6%)), languages classified as syllable-timed have a majority of CV syllables (58% for Spanish). Differences in rhythm between languages reflect whether a language has vowel reduction or not; those classified as stress-timed usually do. In addition, syllable-timed languages either do not have lexical stress or accent is realized by variations in pitch contour. Conversely, stress-timed languages realize word level stress by a combination of length, pitch, loudness and quality changes, which result in clearly discemible beats. These findings are partly reflected in recent measurements of the acoustic correlates of speech rhythm. RAMUS [et al.] (1999) base their measurement on the segmental organization, i.e. the organization of sequences of speech sounds. They divide speech into vocalic and consonantal parts and compute the proportion of the vocalic intervals of a sentence, the standard deviation of these intervals, and the standard deviation of the consonantal intervals. The vocalic proportion (%V) indicates the percentage of vocalic intervals in the total amount of speech. The standard deviation of consonantal intervals (delta C) reflects the variability of duration in consonantal intervals. In languages where both relatively short and relatively long consonantal intervals occur, delta C is higher than in languages where consonantal intervals are of a similar duration. By comparing carefully selected read sentences by four speakers each of 7 different languages along the axes of the percentage of vowels and the standard deviation of the consonantal intervals RAMUS [et al.] succeed in grouping some languages similarly to the originally suggested groups of stress-timing and syllable-timing (Figure 2). lFJLIJllL 32 (2003) Prosody in second language speech production: the role of the native language 0.06 0.055 Eng Dutch • • Pol • 0.05 ltal Span • • u Cat ll o.045 Freoch • " • " 0.04 Jap 0.035 • 36 38 40 42 44 46 48 50 52 54 %V Fig. 2: Vocalic proportion (% V) and standard deviation of consonantal intervals (delta C) of English, Dutch, Polish, Spanish, Italian, French, Catalan and Japanese classified by Ramus [et aL] (1999) 139 English, Polish and Dutch, all presmned stress-timed languages, group together with a relatively low vocalic proportion and a relatively high standard deviation of consonantal intervals. French, Italian, Spanish and Catalan group together at a higher %V and lower delta C. Japanese, finally, differs from those two groups by having an even higher vocalic proportion and even lower consonantal standard deviation. In a similar study, GRABE/ Low (2002) showed that German groups with the "stress-timed" languages English and Dutch and Romanian with the Romance languages French, Italian, Spanish and Catalan. 2.1 Temporal and metrical features of German German speech rhythm has always been described as stress-timed. Phonetic and phonological correlates of this are durational differences between prominent and non-prominent syllables, phonemic vowel length contrast and vowel reduction. The difference between prominent and non-prominent syllables in German is primarily one of duration (DOGIL 1995, KOHLER 1995). Vowels have phonemic length contrast, i.e. the words Saat (seed) and satt (füll [stomach]) only differ in the length of the vowel / a/ . Vowels in unaccented syllables are reduced, which means that the short central vowels / a/ or / e/ are produced (HAKKARAINEN 1995). Many function words can have weak forms with reduced or deleted vowels, depending on speech rate and register. Example ( 4) illustrates this for the article dem. lFLllllL 32 (2003) 140 Ulrike Gut (4) de: m dem dam dm bm m Increasingly stronger reductions are shown from left to right. First, the long vowel [e: ] is shortened to [e], then reduced to [a], then deleted altogether. In the shortest version ofthe article dem, only a single consonant [m] is produced (KOHLER 1990). Quite frequently, vowels in non-prominent syllables are deleted, e.g. Adel is produced / a: dl/ . Altogether, this means that, in Gerrnan, durational differences between prominent and non-prominent syllables are pronounced. Learners of German with native languages that have the same prosodic characteristics can be expected to exhibit different pattems in their speech production than those with typologically less close native languages. In this study, speakers from three typologically different language backgrounds were chosen: Gerrnan learners with English, a Romance language and Mandarin Chinese as native languages. The prosodic systems of these languages will be described in the following. 2.2 Temporal and metrical aspects of English The temporal and metrical organization of English is very similar to that of German. English has always been cited as a typical stress-timed language with distinct durational differences between prominent and non-prominent syllables. English has phonemic vowel length as illustrated in the word pair beat and bit. Even more frequently than in German, vowels in unaccented syllables are reduced or deleted (DELATTRE 1969, KAL- TENBACHER 1998). Whereas in Gerrnan simple words reduced vowels only occur in final syllables or in inflectional morphemes, in English they can occur in a wide variety of positions. According to FLEGE's model (1995), English temporal organization is therefore similar rather than the same than compared to Gerrnan. As for German native speakers, in the phonetic plans of English native speakers syllables will be marked for duration and a variety of articulatory plans for syllable types, including syllables with reduced and deleted vowels, will be available. An English speaker of L2 Gerrnan, however, might produce these syllable types in inappropriate contexts. KALTENBACHER (1998) showed that English learners of Gerrnan produced too frequent and too extreme vowel reduction in unaccented syllables of isolated words. 2.3 Temporal and metrical aspects of ltalian, French and Romanian The Romance languages have always been classified as syllable-timed (Ross1 1998, DASCALU-JINGA 1998). Neither French, ltalian nor Romanian have phonemic vowel length contrast, nor do reduced vowels occur. The durational difference between accented and unaccented syllables should therefore be less pronounced than in the stress-timed languages. For speakers of a Romance language some features of German prosody involve the establishment of new categories. Syllables in the phonetic plan of Romance language speakers are probably not marked for duration and articulation plans for syllables with ]F]LUJL 32 (2003) Prosody in second language speech production: the role of the native language 141 reduced and deleted vowels do not exist. These articulation plans will have to be established by a second language speaker of German. If the L1 syllable plans are produced in Germ.an, the temporal relationship between accented and unaccented syllables is predicted to be too small. 2.4 Temporal .and metrical aspects of Chinese Temporal and metrical organization of Chinese (this paper investigates Mandarin Chinese) is very different from Germ.an. As a tonelanguage with lexical tone, each syllable in Chinese is specified in terms of pitch height or movement. This prosodic feature is stored in the lexicon and is not generated by the Prosody Generator. The role of prominence in Chinese is not well researched. Fox (2001) and ARCIIlBALD (1997) consider Chinese a non-accentual language. KRATOCHVIL (1998) claims that prominence exists in Beijing Chinese but that it is correlated with greater loudness and more pronounced pitch height or movement rather than increased duration. On the other hand, there is length distinction between vowels in Chinese (ZEE 1999). For Chinese speakers, Germ.an speech involves the creation of some new categories for the metrical and temporal organization of syllables. Syllables are probably not marked for duration and no syllable types with deleted vowels will be available. 3. The study 3.1 Participants This study forms part ofthe LeaP project (http: / / www.spectrum.uni-bielefeld.de/ LeaP/ ), which has collected a corpus of prosodically annotated non-native speech of more than 70 speakers; Of these, 14 learners of German, 5 male and 9 female, and two German native speakers were selected for this study (see Table 1). The non-native speakers' ages range from 19 to 57 (mean = 29,375, sd = 11,25). Their language background was divided into three different language groups: English (four speakers), Romance languages (five speakers, two Italian, two Romanian, one French) and Chinese (four speakers). Their age at the first exposure to German ranges from 4 years to 23. The length of residence in Germany or another German-speaking country varies from 2 months to 34 years. The length of formal instruction they received in German ranges from none to 8 years. lFLIII.L 32 (2003) 142 Ulrike Gut Gl German female 52 B G2 German male 27 B El British English male 57 A 14 34 years 8 E2 New Zealand English male 41 B 13 19 years 2 E3 New Zealand English male 29 A 22 7 years 0 E4 British English female 19 B 6 13 years 5 11 Italian male 30 B 17 2 years 6 12 Italian female 22 B 15 2 months 6 13 Italian female 21 B 15 3 months 6 Fl French female 21 B 12 6 months 5 Rl Romanian female 32 B 20 12 years R2 Romanian female 27 B 4 7 years 3 Cl Chinese female 22 B 17 7 months 6 C2 Chinese female 22 B 17 7 months 6 C3 Chinese female 25 B 23 2 years 2 C4 Chinese male 23 B 17 8 months 6 Tab 1: Participants of this study, their native language, sex, age at the time of recording, the story they re-told, their age at first exposure to German, their length of residence at the time of the recording and the total amount of language instruction they had received in German 3.2 Recordings Recordings consisted ofthree parts. First, a short interview (approximately ten minutes) was conducted with the non-native German speakers, in which various questions about their language learning history such as age at first contact with German and length of formal instruction were asked. Second, the participants were asked to read out a short story. Third, they re-told the story in their own words and without reference to the written text. All recordings were carried out in Bielefeld in either a sound-treated or a quiet room. 3.3 Data Only the re-tellings were analysed for this study. Two participants re-told story A (see appendix [Story A and B], page 152), but since this story proved to result in very short re-tellings all subsequent participants retold story B. They received the story prior to the lFILlJllL 32 (2003) Prosody in second language speech production: the role of the native language 143 recording and were allowed to take as much time as they wanted for familiarizing themselves with it. Participants were encouraged to ask for the meaning or the pronunciation of unknown words. 3.4 Analysis The acoustic analysis was carried out using ESPS/ waves+ (version 5.3.1) and Praat (version 4.0.7). lt consisted of four parts: The measurement of the segmental organization, the analysis of the syllable type, the measurement of temporal syllable organization and some measurements of the proficiency of the speaker. All types of analyses were done by one trained phonetician and four students with extensive training and experience in phonetic analyses. 3.4.1 Segmental organization With the help of a wide-band spectrogram, vocalic and consonantal parts of the speech signal were annotated. In order to ensure comparability, the annotation technique used by RAMUS [et al.] (1999) was adopted. This means that pre-vocalic glides were treated as consonants whereas post-vocalic glides were treated as vowels. Vowels were coded as V and stops, fricatives, liquids, nasals, glides, implosives and approximants were coded as C. The beginning and end of a vocalic interval was determined using standard phonetic criteria (PETERSON/ LEHISTE 1960). The onset of a vowel was taken tobe the onset ofthe stable formant structure. Tue offset of a vowel in a vowel-fricative sequence was determined by the onset of high frequency energy; in a vowel-voiceless stop sequence, the vowel ended with the offset of the first formant. Vocalic intervals can comprise one or more subsequent vowels, sometimes belonging to two syllables. Consonantal intervals stretch from the end of a vocalic interval to the beginning of the next vocalic interval and may contain several consonants, sometimes belonging to two different syllables, i.e. the coda of the previous and the onset of the following syllable. Pauses in the recordings were excluded from analysis. All calculations were carried out in the TASX environment (MILDE/ GUT 2002), which provides an XML-based set of corpus analysis tools. 3.4.2 Syllable type All syllables were transcribed phonetically in SAMPA. Transcription was fairly broad but included processes such as nasalization, unreleased stops and aspiration. Syllabification processes were taken into account insofar as ambisyllabicity of certain consonants was allowed (GIEGERICH 1992). Half of the ambisyllabic consonants was considered to belong to the preceding syllable and half to the subsequent one. Resyllabification of final consonants which were produced as initial consonants of the following syllable (as in [Un] - [da] for "und die") was annotated. Syllables were subsequently classified into three classes: (a) as containing a füll vowel, (b) as reduced (when they contained only a / a/ or / e/ ) or (c) as syllables with a deleted vowel, i.e. without a vowel. lFILllL 32 (2003) 144 Ulrike Gut 3.4.3 Temporal organization of syllables Tue temporal organization of syllables was measured using the Rhythm Ratio (GIBBON/ GUT 2001) described above. Ten sentences consisting of at least five syllables were analysed for each speaker (speaker 12 produced only 9 and speaker 13 only 6 sentences of the required length) and the mean Rhythm Ratio was calculated for all sentences. 3.4.4 Ranking ofproficiency For the ranking of the non-native speakers proficiency, both phonetic measurements of speech components and a native speaker assessment were combined. Three standard measurement of the quality of non-native speech (LENNON 1990, TEMPLE 1997) were carried out: (1) Articulation rate: the mean number of syllables per second For this, the total time of pauses was subtracted from the total length of the recording. The number of syllables was divided by the resulting total time of speech. The highest score was ranked highest. (2) Mean length of runs The mean number of words produced between pauses was calculated and the highest score was ranked highest. (3) Pause percentage The total pause time (filled and unfilled) was calculated as a percentage of the total speaking time. The presence of pauses was determined auditorily and pauses have a minimum length of 100 ms. The lowest value was ranked highest. 17 native speakers of German were asked to rate how fluent the speaker' s German was (the instruction was "Bitte geben Sie an, wie flüssig der Sprecher ist." (Please indicate how fluent the speaker is)) on a scale from 1 (excellent) to 5 (very bad). The raters judged the speakers on the basis of a 20 to 40 second passage (approximately the first half of each re-telling). The highest score was ranked highest. These measurements naturally cannot give a complete assessment of a speaker' s proficiency and should only be seen as indicators of a certain stage of fluency. 4. Results > Segmental organization Figure 3 illustrates the speech rhythm of the 14 non-native and the two native speakers of German measured in the variables %V and delta C as proposed by RAMUS [et al.] (1999). Three of the English native speakers group together at a lower %V and a higher delta C value than the German speakers. The fourth English native speaker lies very close to the German native speakers. This is also true for the two Romanian and one Italian speaker. The other two ltalian and the French speaker show a higher %V and a higher delta C value than the German speakers. The Chinese speakers divide into two groups consisting lFLilllL 32 (2003) Prosody in second language speech production: the role of the native language 145 of two speakers each: one group shows a higher delta C value than the Gennan speakers, the other group is different from the German speakers in showing a higher %V value. u "' ; c: "' -= 0,095 0,09 0,085 0,08 0,075 0,07 0,065 0,0625 EI• E4 • E2• R2 • 35 12 • "., Cl•• Fl • E3 n. • ~. 11 C4 • • n1 1>1 • C3 • 45 55 %V Fig. 3: Vocalic proportion (% V) and standard deviation of consonantal intervals (delta C) values for all non-native speakers (E, I, R, Fand C) and the native speakers of German (G) > Syllable type Table 2 illustrates the percentages of all three syllable types (deleted vowel, with reduced vowel, with füll vowel) as average values for the German speakers and the three groups of non-native speakers. The group of Romance language speakers produces significantly fewer syllables with reduced or deleted vowel compared to the German speakers. deleted vowel 5.4 7.7 1.5 1.8 Reduced vowel 26.7 20.2 17.1 25.4 Sum ofred+del 32.1 27.9 18.6 27.2 t-test n.s. ** (p< 0.01) n.s. Tab 2: Percentage of deleted and reduced syllables produced by each speaker group Although none of the other group differences from the German speakers are significant, it can be seen from Table 2 that the English speakers tend to delete more vowels whereas IFJLlll]]L 32 (2003) 146 Ulrike Gut syllables with deleted vowels are extremely rare in the speech of the Romance language and Chinese speak: ers. Looking at individual speech productions (Table 3), it can be seen that speak: er E4 deletes more than twice as many vowels as the German speak: ers and that speak: er E2 produces only half of the percentage of reduced vowels than the German speak: ers. Speak: er 12 does not produce any syllables without vowel and only very few with reduced vowels. Whereas the German speak: ers produce more than a third of their syllables temporally shortened, this is true for only 11 % of all syllables produced by speak: er 13, and 16% of all syllables produced by speak: ers 12 and E2. "" ,g~ '; ) ~ 5.9 4.8 6.4 6.1 5.6 12.6 3 Q) ., 0 ~ > 0-C: ,~ 0 3 9 1.7 5 8 4 g i 24.7 28.7 16.5 10.4 29.9 21.6 17.8 16.1 8.3 2.2 17.9 20 26.5 21.2 21.4 't: I „ ~ ~ > Total 30.6 33.5 22.9 16.5 35.5 34.2 20.8 16.1 11.3 22.9 19.6 20.5 27.5 22 25.4 Tab 3: Percentages of deleted and reduced vowels for each speaker > Temporal organization of syllables Table 4 gives the average Rhythm Ratio (RR) values for each speak: er group. No significant difference of the temporal organization of subsequent syllables was found between any non-native speak: er group and the native speak: ers of German but the Romance language speak: ers as a group tend to ptoduce syllables of more sirnilar duration than the German speak: ers. Individual speak: ers differing most in their RR value from the German speak: ers are speak: er 12 (RR=76), Rl (RR=70.8), R2 (RR=71.5) and Cl (RR=70.5). AverageRR 62.55 65.55 68.34 67.44 n.s. n.s. n.s. Tab 4: Average RR value for each speaker group > Proficiency Bach non-native speak: er was assigned a rank for each of the three proficiency measurements and the native speak: er assessment listed above. Individual and average rankings are illustrated in Table 5. lFLlllL 32 (2003) Prosody in second language speech production: the role of the native language El E2 E3 E4 11 12 13 Fl Rl R2 Cl C2 C3 C4 1 2 2 6 6 6 3 7 5 2 10 4 12 8 6 9 3 9 9 13 13 14 13 12 14 13 14 11 8 11 12 10 7 10 4 2 1 2 5 11 7 5 3 9 5 8 8 4 4 7 14 10 12 11 Tab 5: Rank for each proficiency measurement, the native speaker assessment and average rank for each non-native speaker 147 1 3 5 9 8 13 13 12 10 1 7 6 4 11 Variation within speakers is very high, except for the top two speakers (El and R2) and the bottom four speakers (12, I3, C4 and R 1). Comparison of proficiency with the measurements of prosodic features will therefore mainly be limited to those speakers. Tue proficiency rankings seem closely related to length of residence: the top ranked speakers had been resident in Germany for a number of years (7 and 34) at the time of recording whereas the bottom ranked speakers bad only been resident for a few months. However, this finding says nothing about a causal relationship between the two variables. No relationship between the segmental organization of second language speech and proficiency ranking can be determined. Both top and bottom ranked speakers are equally far removed from native speaker values. In terms of the production of syllable types, however, low ranked speakers show the greatest difference from the native German speakers. Speakers 12, I3 and Rl produce very few syllables with reduced and deleted vowels. Speakers 12 and Rl furthermore produce a temporal organization of syllables in their German speech with the greatest difference to that of the native speakers. 5. Discussion The object of this study was to determine the influence of the native language on the prosody of second language speech. In speech production models (e.g. DE BOT 1992) it has been suggested that the typological "closeness" oftwo languages affects their potential interaction or interference during the process of phonological and specifically prosolFJLllllL 32 (2003) 148 Ulrike Gut die encoding. Two aspects of speech production can be affected: the generation of prosodic properties, in which syllables are marked according to their temporal and accentual (metrical) features and articulation, where these syllable plans are converted into motor plans. Our first hypothesis was that if the native language does not provide temporal or accentual markers as is probably the case for Romance languages (syllables are not specified for length) and Chinese (syllables are not specified for length and accent), learners of German with these native languages will produce syllables unspecified for these aspects in their second language speech. Second language speakers of German with English as their native language, on the other band, were not expected to produce different temporal pattems since English has similar temporal and metrical marking of syllables as German. A comparison of the temporal organization of subsequent syllables in the speech of the three groups of non-native speakers of German revealed no significant differences from the German native speakers. However, individual speakers such as one Italian speaker, the two Romanian speakers and one Chinese speaker produced strings of syllables with clearly more similar length than the German native speakers. The second line of investigation was concemed with syllable types produced by learners of German. In German, syllables with reduced and deleted vowels occur at specific non-prominent positions. lt was hypothesized that speakers of Romance languages and of Chinese will not have these syllable types available whereas English native speakers might overproduce them in German. Evidence partly supporting this was found in a group comparison of Romance language learners of German and German native speakers. The Romance language speakers produced significantly fewer syllables with deleted or reduced vowels. The Chinese speakers showed a tendency in the same direction but no significant differences were found. Individual speakers showed large differences in their production of syllable types: two Italian speakers and one English speaker produced only half of the amount of reduced and deleted vowels in syllables compared to the German native speakers. The study yielded unexpected results for the English group: vowel reduction in syllables was less frequent rather than more frequent compared to the German native speakers, although vowel deletion was slightly higher. Differences in speech rhythm as measured in the segmental organization confirm these results. The English native speakers tend to produce proportionally fewer vocalic intervals in their speech compared to the German speakers. Most Romance language speakers produce higher vocalic percentages and consonantal standard deviations although some speakers' values are very similar to the German native speakers. The Chinese group is heterogeneous with either higher %V or higher delta C values. Compared to RAMUS [ et al.'s] (1999) classification of languages, the English speakers show native language influence. The Romance language speakers do so too in terms ofhigher %V. Higher delta C values cannot be explained by native language influence. The results suggest that there is indeed an influence of native language prosodic pattems on second language speech. Speakers whose Ll presumably does not mark syllables as reduced or short produce those syllables significantly less in their L2. HowlFLIJllL 32 (2003) Prosody in second language speech production: the role of the native language 149 ever, the prediction made by FLEGE's (1995) model were not confirmed. The similar categories English speakers have for syllable plans do not seem to pose more problems than the new categories Romance language and Chinese speakers have to establish for syllables with reduced and deleted vowels. Apart from the phonological structure of the native language, aspects of proficiency seem to play a role in the production of second language prosody. Even the very rough measurements and rankings carried out in this study showed that low proficiency precludes near-native production of prosodic features. There is plenty of evidence that all non-native speakers, even the very proficient ones, have no separate systems of phonological encoding and articulation. As FLEGE (2002) put it, they produce speech from a "shared phonological space". No independent speech production components can be assumed. However, since complete separation is not possible even for childhood bilinguals who grow up with two languages (PARADIS 2001; GROSJEAN 1982), this is not a state which can be expected (or even wished for) in second language speakers. References ABERCROMBIE, Daniel (1967): Elements ofGeneral Phonetics. Edinburgh: Edinburgh University Press. ARCHIBALD, John (1994): "A formal model of learning L2 prosodic phonology". In: Second Language Research 10, 215-240. ARCHIBALD, John (1997): "The acquisition of English stress by speak: ers of tone languages: lexical storage versus computation". In: Linguistics 35, 167-181. BORN, Ocke Schwen/ FLEGE, James (1997): "Perception and production of a new vowel category by adult second language learners". In: JAMES, Alan / LEATHER, Jonathan (eds.): Second Language Speech. Berlin: Mouton, 53-73. CLASSE, Andre (1939): The Rhythm of English Prose. Oxford: Blackwell. CUTLER, Anne/ MEHLER, Jaques/ NORRIS, D./ SEGUI, J. (1992): "The Monolingual Nature of Speech Segmentation by Bilinguals". In: Cognitive Psychology 24, 381-410. DASCALU-JINGA, Laurentia (1998): "Intonation in Romanian". In: HIRST, Daniel/ DI CRISTO, Albert (eds.): Intonation Systems. Cambridge: Cambridge University Press, 239-260. DAUER, R. (1983): "Stress-timing and syllable-timing reanalysed". In: Journal of Phonetics 11, 51-62. DE BOT, Kees (1992): "A Bilingual Production Model: Levelt's Speaking Model Adapted". In: Applied Linguistics 13, 1-24. DELATTRE, P. (1969): "An acoustic and articulatory study of vowel reduction in four languages". In: International Review of Applied Linguistics 7, 295-325. DOGIL, Grzegorzc (1995): "The phonetic manifestation of word stress". In: Arbeitspapiere des Instituts für maschnielle Sprachverarbeitung - Phonetik, Universität Stuttgart, Vol. 2, no. 2. FLEGE, James (1995): "Second language speech learning theory, findings and problems". In: STRANGE, Winifred (ed.): Speech perception and linguistic experience: Issues in cross-linguistic research. Timonium: York Press, 233-277. FLEGE, James (2002): "No Perfect Bilinguals". In: New Sounds 2000, 132-141. Fox, Anthony (2001): Prosodie Features and Prosodie Structure. Oxford: Blackwell. GIBBON, Dafydd/ GUT, Ulrike (2001): "Measuring Speech Rhythm". In: Proceedings of Eurospeech, Aalborg, 91-94. IFLi.nlL 32 (2003) 150 Ulrike Gut GIEGERICH, Heinz (1992): English Phonology. Cambridge: Cambridge University Press. GRABE, Esther/ Low, Ee-Ling (2002): "Durational Variability in Speech and the Rhythm Class Hypothesis". In: GUSSENHOVEN, Carlos/ WARNER, N. (eds): Papers in Ldboratory Phonology 7, Berlin: Mouton, 515-546. GROSJEAN, Francois (1982): Life with two languages: An Introduction to Bilingualism. Cambridge, Mass.: Harvard University Press. GROSSER, Wolfgang (1989): "Akzentuierung und Intonation im englischen Erwerb österreichischer Lerner". In: Salzburger Studien zur Anglistik und Amerikanistik 9. Salzburg: University of Salzburg. GUT, Ulrike (2002): "On the Acquisition of Rhythmic Structure". In: New Sounds 2000, 148-154. GUT, Ulrike/ URUA, Eno-Abasi/ AoüUAKOU, Sandrine/ GIBBON, Dafydd (2002): "Rhythm in West African tone languages: a study of Ibibio, Anyi and Ega". In: GUT, Ulrike/ GIBBON, Dafydd (eds.): Typology of African Prosodie Systems. Universität Bielefeld, 155-161. HAKKARAINEN, Heikki (1995): Phonetik des Deutschen. München: Wilhelm Fink. HILL, David/ JASSEM, Wiktor / WITTEN, lan (1979). "A statistical approach to the problem of isochrony in spoken British English". In: HOLLIEN, Harry/ HOLLIEN, Patricia (eds.): Current issues in linguistic theory. Vol. 9: Current issues in the phonetic science. Amsterdam: John Benjamins, 285-294. JILKA, Matthias (2002): "Testing the contribution of prosody to the perception of foreign accent". In: New Sounds 2000, 199-207. KALTENBACHER, Erika (1998): "Zum Sprachrhythmus des Deutschen und seinem Erwerb". In: WEGE- NER, Heide (ed.): Eine zweite Sprache lernen. Tübingen: Narr, 21-38. KOHLER, Klaus (1990): "Segmental reduction in connected speech in German: Phonological facts and phonetic explanation". In: HARDCASTLE, William/ MARCHAL, Alain (eds.): Speech Production and Speech Modelling. Amsterdam: Kluwer, 69-92. KOHLER, Klaus (1995). Einführung in die Phonetik des Deutschen. Berlin: Schmid (2nd edition). KRATOCHVIL, Paul (1998): "Intonation in Beijing Chinese". In: HlR.ST, Daniel/ Dr CRrSTO, Albert (eds. ): _ Intonation Systems. Cambridge: Cambridge University Press, 417-431. LENNON, Paul (1990): "Investigating fluency in EFL: A quantitative approach". In: Language Learning 40.3, 387-417. LEVELT, Willem (1989), Speaking. From Intention to Articulation. Cambridge, Mass.: MIT Press. MArRS, Jane (1989): "Stress assignment in interlanguage phonology: an analysis of the stress system of Spanish speakers learning English". In: GASS, Susan/ SCHACHTER, Jacquelyn (eds.): Linguistic Perspectives on Second Language Acquisition. Cambridge, MA: Cambridge University Press, 260-283. MILDE, Jan-Torsten / GUT, Ulrike (2002): "A prosodic corpus of non-native speech". In: BEL, Bemard / MARLIEN, Isabelle (eds.): Proceedings of the Speech Prosody 2002 conference, 11-13 April 2002. Aix-en-Provence: Laboratoire Parole et Langage, 503-506. O'CONNOR, John (1965): "The Perception ofTime Intervals". In: Progress Report 2, Phonetics Laboratory, UCL, 11-15. PARADrs, Joanne (2001): "Do bilingual two year olds have separate phonological systems". In: International Journal of Bilingualism 5.1, 19-38. PETERSON, Gordon/ LEHISTE, Ilse (1960): "Duration of Syllable Nuclei in English". In: Journal of the Acoustical Society of America 32.6, 693-703. PIKE, Kenneth (1945): The Intonation of American English. Ann Arbor: University of Michigan Press. RAMUS, Franck/ NESPOR, Marina/ MEHLER, Jaques (1999): "Correlates of linguistic rhythm in the speech signal". In: Cognition 73.3, 265-292. ROACH, Peter (1982): "On the distinction between 'stress-timed' and 'syllable-timed' languages." In: CRYSTAL, David (ed.): Linguistic controversies. Essays in linguistic theory and practice. London: Edward Arnold, 73-79. Rossr, Mario (1998): "Intonation in Italian". In: HrRsT, Daniel/ Dr CRISTO, Albert (eds.): Intonation IFJLwL 32 (2003) Prosody in second language speech production: the role of the native language 151 Systems. Cambridge: Cambridge University Press, 219-238. TEMPLE, Liz (1997): "Memory and Processing Modes in Language Learner Speech Production". In: Communication and Cognition 30, 75-90. TRACY, Rosemarie (1996): "Vom Ganzen und seinen Teilen. Überlegungen zum doppelten Erstspracherwerb". In: Sprache und Kognition 15, 70-92. ULDALL, Elizabeth (1971): "Isochronous Stresses in R.P.". In: HAMMERICH, Louis/ JACOBSON, Roman / ZWIRNER, Eberhard (eds.): Form and substance. Copenhagen: Akademisk Forlag, 205-210. WATSON, Ian (1991): "Phonological processing in two languages". In: BIALYSTOK, Ellen (ed.): Language processing in bilingual children. Cambridge: Cambridge University Press, 25-48. WENK, Brian (1985): "Speech Rhythms in Second Language Acquisition". In: Language and Speech 28.2, 157-174. ZEE, Erle (1999): "Chinese (Hong Kong Cantonese)". In: Handbook of the International Phonetic Association. Cambridge: Cambridge University Press, 58-60. IFILulL 32 (2003) 152 Appendix Story A Das Telefon klingelte. "Ich geh dran! " riefLinda. "Hallo? " "Hi Linda, hier ist Nick." Ulrike Gut Lindas Herz schlug schneller. Nick war der letzte Mensch auf der Welt, mit dem sie im Moment sprechen wollte. Aber sie schaffte es, mit einer freundlichen Stimme zu sagen: "Ach, hallo Nick. Nett von Dir, mich anzurufen." "Hör zu Linda. Ich möchte, daß Du sofort rüberkommst." "Was? " schrie Linda auf. Sie riß sich sclmell zusammen. "Ich fürchte, ich kann jetzt gerade nicht" sagte sie bestimmt. "Ich denke, daß Du etwas Zeit finden wirst, wenn ich Dir sage, daß es um Tom geht." sagte Nick. Und als Linda nicht antwortete, fügte er hinzu: "Nun? " Linda überlegte sich, ob sie sagen sollte „Ich weiß gar nicht, wovon Du redest." Aber sie wußte, daß sie Nick nichts vormachen konnte. "Ich komme in fünf Minuten" sagte sie und legte auf. Sie rannte nach oben, zog ihre Schuhe an, nahm ihren Mantel und die Schlüssel, als es klingelte. Panik befiel sie. "Was soll ich nur tun? Was soll ich nur tun? " sagte sie zu sich selbst. Als sie die Türe öffnete, saß ein Kind auf den Stufen. Linda konnte ihren Augen nicht trauen. "Du? " staunte sie. Es war Tom. Story B (Der Löwe und die Maus) Ein Löwe und eine Maus gingen spazieren, als sie am Wegrand ein großes Stück Käse liegen sahen. „Bitte Löwe, laß es mich haben! " sagte die Maus, "Du magst doch gar keinen Käse. Sei lieb und such Dir etwas anderes zu fressen." Aber der Löwe legte seine Pfote auf den Käse und sagte: "Er gehört mir! Und wenn Du nicht sofort verschwindest, fresse ich Dich auch! " Die Maus war sehr traurig und ging fort. Der Löwe versuchte, den ganzen Käse auf einmal zu verschlingen, aber er blieb ihm im Hals stecken, und was er auch versuchte, er konnte ilm nicht herunterschlucken. Nach einer Weile kam ein Hund vorbei und der Löwe bat ihn um Hilfe. "Da kann ich nichts machen." sagte der Hund und ging weiter. Dann kam ein Frosch vorbei und der Löwe bat ihn um Hilfe. "Da kann ich nichts machen." sagte der Frosch und hüpfte davon. Schließlich ging der Löwe zur Wohnung der Maus. Sie lag in ihrem Bett in einem Loch, das sie sich gegraben hatte. "Bitte, liebe Maus, hilf mir! " sagte der Löwe. "Der Käse steckt in meinem Hals und ich kann ihn nicht herunterschlucken." „Du bist ein böser Löwe." sagte die Maus. "Du hast mir den Käse nicht gelassen. Aber ich werde Dir trotzdem helfen. Sperr Dein Maul auf und laß mich hinein springen. Ich knabbere an dem Käsestück, bis es klein genug ist, Dir den Hals hinunter zu fallen." Der Löwe öffnete sein Maul. Die Maus sprang herein und begann am Käse zu knabbern. Da dachte der Löwe: "Ich habe wirklich großen Hunger." IFLllllL 32 (2003)